from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(min_samples_split=40) clf.fit(features_train, labels_train)
The course consists of 17 lessons. Every lesson has a couple of hours of video and lots and lots of quizzes in it.
- [x] Lesson 1: Only introduction 🙂
- [x] Lesson 2: Naive Bayes
- [x] Lesson 3: Support Vector Machines
- [x] Lesson 4: Decision Trees
- [x] Lesson 5: Choose your own algorithm
- [ ] Lesson 6: Datasets and questions
- [ ] Lesson 7: Regression
- Lesson 8: Outliers
- Lesson 9: Clustering
- Lesson 10: Feature Scaling
- Lesson 11: Text Learning
- Lesson 12: Feature Selection
- Lesson 13: PCA
- Lesson 14: Validation
- Lesson 15: Evaluation Metrics
- Lesson 16: Tying it all together
- Lesson 17: Final project
Lesson 2 of the Udacity Course UD120 – Intro to Machine Learning deals with Naive Bayes classification. Continue reading “Lesson 2: Naive Bayes”
The term Support Vector Machines or SVM is a bit misleading. It is just a name for a very clever algorithm invented by two Russians. in the 1960s. SVM are used for classification and regression. Continue reading “Lesson 3: Support Vector Machines”
a = np.array([1,2,3,4]) [1 2 3 4]
After working for a couple of years with Python and external dependencies I’ve ran again and again into the same kind of problems.
Say you have a global python installation under e.g. C:\Python27 on Windows. When you start working on your first python project you want to use external packages and you encounter pip as dependency management tool. (pip is part of the python installation since 2.7.9 / 3.4) So far so good.
But you keep installing all the packages into your global python installation. Continue reading “Python pip and virtualenv”
During linting my code jshint gave me the “hint” that I should prefer dot notation over bracket notation.
||[‘testcase’] is better written in dot notation.|
What is that?
- Accessing members with “.” is called “dot notation”.
- Accessing them with  is called “bracket notation”.
Python is a dynamically typed language which makes it easy and fun to program. But sometimes especially in bigger projects it can become quitre cumbersome when you just receive errors at run time.
Given the hypothetical example where we define a function which multiplies integer: Continue reading “Python Type Checking”
Diving deeper into data science I started to brush up my knowledge about math especially statistics.
The Mother of all Distributions
The normal distribution was formulated by Carl Friedrich Gauß in 18XX and can be implemented in Python like the following :
def normal_distribution(x, mu=0, sigma=1): sqrt_two_pi = math.sqrt(2*math.pi) return math.exp(-(x-mu)**2 / 2 / sigma**2) / sqrt_two_pi * sigma
For validating your model You need to split your data into a training and a test data set.
More training data means a better model, more test data means better validation.
But because the amount of data to train/test the model is limited you have to decide in which ratio of training vs test data you want to split your data.
import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets from sklearn import svm iris = datasets.load_iris() iris.data.shape, iris.target.shape
Sample a training set while holding out 40% of the data for testing:
X_train, X_test, y_train, y_test = train_test_split( iris.data, iris.target, test_size=0.4, random_state=0)
from sklearn.model_selection import cross_val_score clf = svm.SVC(kernel='linear', C=1) scores = cross_val_score(clf, iris.data, iris.target, cv=5)