Lesson 2 of the Udacity Course UD120 – Intro to Machine Learning deals with Naive Bayes classification.
Mini project
For the mini project you should fork https://github.com/udacity/ud120-projects and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of memory.
Dependecies
After cloning the repo I would recommend setting up a venv and install the requirements:
- sklearn
- numpy
- scipy
- matplotlib
The Code
The code itself is pretty straightforward:
- Instantiate the classifier
- Train (fit) the Classifier
- Predict
- Calculate accuracy
# training print("Start training") t0 = time() clf = GaussianNB() clf.fit(features_train, labels_train) print("training time:", round(time() - t0, 3), "s") # prediction print("start predicting") t0 = time() prediction = clf.predict(features_test) print("predict time:", round(time() - t0, 3), "s") # accuracy print("Calculating accuracy") accuracy = accuracy_score(labels_test, prediction) print("Accuracy calculated, and the accuracy is", accuracy)
The output on my machine:
training time: 1.762 s start predicting predict time: 0.286 s Calculating accuracy Accuracy calculated, and the accuracy is 0.9732650739476678
The simple Gaussian Naive Bayes is pretty accurate with 97.3%