10 things I didn’t know about Data Science a year ago

In my article My personal road map for learning data science in 2018 I wrote about how I try to tackle the data science knowledge sphere. Due to the fact that 2018 is slowly coming to an end I think it is time for a little wrap up.

What are the things I learned about Data Science in 2018? Here we go:

1. The difference between Data Science, Machine Learning, Deep Learning and AI

A picture says more than a thousand words.

2. The difference between supervised and unsupervised learning

Supervised Learning

You have training and test data with labels. Labels tell You to which e.g. class a certain data item belongs. Image you have images of pets and the labels are the name of the pets.

Unsupervised Learning

Your data doesn’t have labels. Your algorithm e.g. k-means clustering need to figure out a structure given only the data

3. The areas of applied machine learning

are described here: The Essence of Machine Learning and Data Science Overview

4. Bayes Theorem

In my article Bayes theorem I elaborated about the base rate fallacy and in naive bayes I recapped the second lesson from udacity’s UD120 Intro to Machine Learning

5. Precision and Recall and ROC

In my article classification: precision and recall I wrote about different useful measures to evaluate the quality of a supervised learning algorithm.

In Receiver Operating Characteristic I wrote about another useful measures the ROC.

6. Visualization with matplotlib

Matplotlib is a really good starting point for visualization. I wrote about it in Introduction to matplotlib, Matplotlib – Part 2, Scatterplot with matplotlib

7. Math with numpy

I wrote some articles about the usage of numpy but only scraped the surface of this mighty library

8. Image manipulation with OpenCV

Intro to OpenCV with Python

9. JuPyter Notebooks

Sometimes I love them sometimes I hate them. I wrote an Introduction to JuPyter Notebook

10. Podcasts

In 2018 I’ve listened to a bunch of great podcasts on iTunes: