April 25, 2018 - Creatronix

The Normal Distribution

Data Science & SQL, PythonBy Jörn April 25, 2018

Diving deeper into data science I started to brush up my knowledge about math especially statistics. The Mother of all Distributions The normal distribution was formulated by Carl Friedrich Gauß in 1809 and can be implemented in Python like the following : def normal_distribution_pdf(x, mu=0, sigma=1): sqrt_two_pi = math.sqrt(2*math.pi) return (1 / (sqrt_two_pi * sigma))…

What is Cross-Validation in Data Science?

Data Science & SQL, PythonBy Jörn April 25, 2018

Motivation Cross-validation is a technique to validate the quality of your machine learning model. For validating your model you split your training data into a training and a test data set. ———————————————– | | | | training data | test data | | | | ———————————————– More training data means a better model, more test…

Introduction to Jupyter Notebook

Data Science & SQL, Python, ToolsBy Jörn April 25, 2018

JuPyteR Do You know the feeling of being already late to a party when encountering something new? But when you actually start telling others about it, you realize that it is not too common knowledge at all, e.g. Jupyter Notebooks. What is a Jupyter notebook? In my own words: a browser-based document-oriented command line style…

Data Science Datasets: Iris flower data set

Data Science & SQLBy Jörn April 25, 2018

Motivation When you are going to learn some data science the aquisition of data is often the first step. To get you started scikit-learn comes with a bunch of so called “toy datasets”. One of them is the Iris dataset. Prerequisites & Imports Besides scikit-learn we will use pandas for data handling and matplotlib with…

Daily Archives: April 25, 2018

The Normal Distribution

What is Cross-Validation in Data Science?

Introduction to Jupyter Notebook

Data Science Datasets: Iris flower data set