Python 3.7 – data classes

A cool new feature made its way into Python 3.7: Data classes. When You’ve already read my article about Lombok the concept isn’t so new at all: With the new decorator @dataclass You can save a huge amount of time because the methods __init__() __repr__() __eq__() are created for you! from dataclasses import dataclass @dataclass…

New Blog Post

numpy random choice

With numpy you can easily create test data with random_integers and randint. numpy.random.randint(low, high=None, size=None, dtype=’l’) numpy.random.random_integers(low, high=None, size=None) random_integers includes the high boundary while randint does not. >>> import numpy as np >>> np.random.random_integers(5) 4 >>> np.random.random_integers(5, size=(5)) array([5, 3, 4, 1, 4]) >>>np.random.random_integers(5, size=(5, 4)) array([[2, 3, 3, 5], [1, 3, 1, 3],…

Linear Algebra with numpy – Part 1

Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C. It is often used together with pandas, matplotlib and Jupyter notebooks. Often these packages are referred to as the datascience stack. Installation You can install numpy via pip pip install numpy Basic Usage In the datascience…

Python pip and virtualenv

After working for a couple of years with Python and external dependencies I’ve ran again and again into the same kind of problems. Bad habits Say you have a global python installation under e.g. C:\Python36 on Windows. When you start working on your first python project you want to use external packages and you encounter…

New Blog Post

Python Type Checking

Python is a dynamically typed language which makes it easy and fun to program. But sometimes -especially in bigger projects- it can become quite cumbersome when you just receive errors at run time. Given the hypothetical example where we define a function which multiplies integer: def multiply(a, b): return a * b print(multiply(“I”, “You”)) It…

The Normal Distribution

Diving deeper into data science I started to brush up my knowledge about math especially statistics. The Mother of all Distributions The normal distribution was formulated by Carl Friedrich Gauß in 18XX and can be implemented in Python like the following : def normal_distribution(x, mu=0, sigma=1): sqrt_two_pi = math.sqrt(2*math.pi) return math.exp(-(x-mu)**2 / 2 / sigma**2)…

Data Science: Cross-Validation

For validating your model you need to split your data into a training and a test data set. More training data means a better model, more test data means better validation. But because the amount of data to train/test the model is limited you have to decide in which ratio of training vs test data…

Introduction to Jupyter Notebook

JuPyteR Do You know the feeling of being already late to a party when encountering something new? But when you actually start telling others about it, You realize that it is not too common sense at all, e.g. Jupyter Notebooks. What is a Jupyter notebook? In my own words: a browser-based document-oriented command line style…