## 10 things I didn’t know about Data Science a year ago

In my article My personal road map for learning data science in 2018 I wrote about how I try to tackle the data science knowledge sphere. Due to the fact that 2018 is slowly coming to an end I think it is time for a little wrap up.

What are the things I learned about Data Science in 2018? Here we go:

## Classification: Precision and Recall

In the realms of Data Science you’ll encounter sooner or the later the terms “Precision” and “Recall”. But what do they mean?

## Clarification

Living together with little kids You very often run into classification issues:

My daughter really likes dogs, so seeing a dog is something positive. When she sees a normal dog e.g. a Labrador and proclaims: “Look, there is a dog!”

That’s a True Positive (TP) Continue reading “Classification: Precision and Recall”

## Linear Algebra with numpy – Part 1

Numpy is a package for scientific computing in Python.

`import numpy as np`

The most important data structure is ndarray, which is short for n-dimensional array.

You can convert a list to an numpy array with the array-method

```my_list = [1, 2, 3, 4]
my_array = np.array(my_list)```

You can also convert an array back to a list with Continue reading “Linear Algebra with numpy – Part 1”

## JuPyteR

Do You know the feeling of being already late to a party when encountering something new?

But when you actually start telling others about it, You realize that it is not too common sense at all, e.g. Jupyter Notebooks.

What is a Jupyter notebook? Continue reading “Introduction to Jupyter Notebook”

## Data Science Datasets: Iris flower data set

The Iris flower data set or Fisher’s Iris data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines.

It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphological variation of Iris flowers of three related species.

This data set can be imported from scikit-learn like the following:

```from sklearn import datasets

iris.data.shape, iris.target.shape```

## Questions

Data Science tries to answer one of the following questions:

• Classification -> “Is it A or B?”
• Clustering -> “Are there groups which belong together?”
• Regression -> “How will it develop in the future?”
• Association -> “What is happening very often together?”

There are two ways to tackle these problem domains with machine learning:

1. Supervised Learning
2. Unsupervised Learning

## Supervised Learning

You have training and test data with labels. Labels tell You to which e.g. class a certain data item belongs. Image you have images of pets and the labels are the name of the pets.

## Unsupervised Learning

Your data doesn’t have labels. Your algorithm e.g. k-means clustering need to figure out a structure given only the data

## My personal road map for learning data science in 2018

I got confused by all the buzzwords: data science, machine learning, deep learning, neural nets, artificial intelligence, big data, and so on and so on.

As an engineer I like to put some structure to the chaos. Inspired by Roadmap: How to Learn Machine Learning in 6 Months and Tetiana Ivanova – How to become a Data Scientist in 6 months a hacker’s approach to career planning I build my own learning road map for this year: Continue reading “My personal road map for learning data science in 2018”

## Bayes’ Theorem

Imagine that you come home from a party and you are stopped by the police. They ask you to take a drug test and you accept. The test result is positive. You are guilty.

But wait a minute! Is it really that simple?