Lesson 2: Naive Bayes

Lesson 2 of the Udacity Course UD120 – Intro to Machine Learning deals with Naive Bayes classification.

Mini project

For the mini project you should fork https://github.com/udacity/ud120-projects and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of memory.


After cloning the repo I would recommend setting up a venv and install the requirements:

  • sklearn
  • numpy
  • scipy
  • matplotlib

The Code

The code itself is pretty straightforward:

  • Instantiate the classifier
  • Train (fit) the Classifier
  • Predict
  • Calculate accuracy
# training
print("Start training")
t0 = time()
clf = GaussianNB()
clf.fit(features_train, labels_train)
print("training time:", round(time() - t0, 3), "s")

# prediction
print("start predicting")
t0 = time()
prediction = clf.predict(features_test)
print("predict time:", round(time() - t0, 3), "s")

# accuracy
print("Calculating accuracy")
accuracy = accuracy_score(labels_test, prediction)
print("Accuracy calculated, and the accuracy is", accuracy)

The output on my machine:

training time: 1.762 s
start predicting
predict time: 0.286 s
Calculating accuracy
Accuracy calculated, and the accuracy is 0.9732650739476678

The simple Gaussian Naive Bayes is pretty accurate with 97.3%