Lesson 2 of the Udacity Course UD120 – Intro to Machine Learning deals with Naive Bayes classification.
Mini project
For the mini project you should fork https://github.com/udacity/ud120-projects and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of memory.
Dependecies
After cloning the repo I would recommend setting up a venv and install the requirements:
- sklearn
- numpy
- scipy
- matplotlib
The Code
The code itself is pretty straightforward:
- Instantiate the classifier
- Train (fit) the Classifier
- Predict
- Calculate accuracy
# training
print("Start training")
t0 = time()
clf = GaussianNB()
clf.fit(features_train, labels_train)
print("training time:", round(time() - t0, 3), "s")
# prediction
print("start predicting")
t0 = time()
prediction = clf.predict(features_test)
print("predict time:", round(time() - t0, 3), "s")
# accuracy
print("Calculating accuracy")
accuracy = accuracy_score(labels_test, prediction)
print("Accuracy calculated, and the accuracy is", accuracy)
The output on my machine:
training time: 1.762 s start predicting predict time: 0.286 s Calculating accuracy Accuracy calculated, and the accuracy is 0.9732650739476678
The simple Gaussian Naive Bayes is pretty accurate with 97.3%





