New Blog Post

Python datetime and format

One of the things I always forget is date and time in Python. So message to myself: The strftime method is used for formatting (string_format_time) import datetime start_date = datetime.datetime.now() DATE_FORMAT = ‘%d/%m/%Y %H:%M’ print(start_date.strftime(DATE_FORMAT)) Her is a nice little Cheatsheet

New Blog Post

Python3: ChainMap

Since Python 3.3 You can chain dictionaries which contain the same key in a prioritized order: from collections import ChainMap prio_1 = {“param_1”: “foo”} prio_2 = {“param_1”: “foobar”, “param_2”: “bar”} combined = ChainMap(prio_1, prio_2) print(combined[“param_1”]) # outputs ‘foo’ print(combined[“param_2”]) # outputs ‘bar’ The param_1 from the prio_1 dictionary is dominant, so it isn’t overwritten by…

The Agile Manifesto

When you are working in an agile team e.g. Scrum you might have heard about the agile manifesto. Formulated in 2001 it influenced a lot of software developers and methodologies like Scrum. The Agile Manifesto consists of 4 values and 12 principles: Values Principles Our highest priority is to satisfy the customer through early and…

What is Big Data

Big Data is a buzz word nowadays. But when is data “big data”? Three Vs of Big Data Big data spans three dimensions Volume Velocity Variety Volume Everything which doesn’t fit on your local hard drive anymore can be considered big. Think  >10 TeraByte Velocity The more real time your data becomes the more you…

Distributing your own package on PyPi

In Regular Expressions Demystified I developed a little python package and distributed it via PyPi. I wanted to publish my second self-written package as well, but coming back after almost a year, some things have changed in the world of PyPi, i.e. the old tutorials aren’t working anymore. So I wrote this article to bring…

Introduction to matplotlib

Overview matplotlib is the workhorse of data science visualization. The module pyplot gives us MATLAB like plots. You can install it via pip install matplotlib The most basic plot is done with the “plot”-function. It looks like this: import matplotlib.pyplot as plt plt.plot([0, 1, 2, 3], [0, 1, 2, 3]) plt.show() The plot function takes…

Scatterplot with matplotlib

When you area already familiar with the basic plot from the introduction to matplotlib here is another type of plot used in data science. A very basic visualization is the scatter plot: import numpy as np import matplotlib.pyplot as plt N = 100 x = np.random.rand(N) y = np.random.rand(N) plt.scatter(x, y) plt.show() Color of the…

Feature Scaling

What is Feature Scaling? Feature Scaling is an important pre-processing step for some machine learning algorithms. Imagine you have three friends of whom you know the individual weight and height. You would like to deduce Christian’s  t-shirt size from David’s and Julia’s by looking at the height and weight. Name Height in m Weight in…