New Blog Post

The Normal Distribution

Diving deeper into data science I started to brush up my knowledge about math especially statistics. The Mother of all Distributions The normal distribution was formulated by Carl Friedrich Gauß in 1809 and can be implemented in Python like the following : def normal_distribution_pdf(x, mu=0, sigma=1): sqrt_two_pi = math.sqrt(2*math.pi) return math.exp(-(x-mu)**2 / 2 / sigma**2)…

New Blog Post

What is Cross-Validation in Data Science?

Motivation Cross-validation is a technique to validate the quality of your machine learning model. For validating your model you split your training data into a training and a test data set. ———————————————– | | | | training data | test data | | | | ———————————————– More training data means a better model, more test…

Introduction to Jupyter Notebook

JuPyteR Do You know the feeling of being already late to a party when encountering something new? But when you actually start telling others about it, you realize that it is not too common knowledge at all, e.g. Jupyter Notebooks. What is a Jupyter notebook? In my own words: a browser-based document-oriented command line style…

New Blog Post

Data Science Datasets: Iris flower data set

Motivation When you are going to learn some data science the aquisition of data is often the first step. To get you started scikit-learn comes with a bunch of so called “toy datasets”. One of them is the Iris dataset. Prerequisites & Imports Besides scikit-learn we will use pandas for data handling and matplotlib with…