Python pip and virtualenv

After working for a couple of years with Python and external dependencies I’ve ran again and again into the same kind of problems.

Bad habits

Say you have a global python installation under e.g. C:\Python27 on Windows. When you start working on your first python project you want to use external packages and you encounter pip as dependency management tool. (pip is part of the python installation since 2.7.9 / 3.4) So far so good.

But you keep installing all the packages into your global python installation. Continue reading “Python pip and virtualenv”

JavaScript: dot vs bracket notation

During linting my code jshint gave me the “hint” that I should prefer dot notation over bracket notation.

"testcase": data.finding["testcase"], [‘testcase’] is better written in dot notation.

What is that?

  • Accessing members with “.” is called “dot notation”.
  • Accessing them with [] is called “bracket notation”.


The Normal Distribution

Diving deeper into data science I started to brush up my knowledge about math especially statistics.

The Mother of all Distributions

The normal distribution was formulated by Carl Friedrich Gauß in 18XX and can be implemented in Python like the following :

def normal_distribution(x, mu=0, sigma=1):
    sqrt_two_pi = math.sqrt(2*math.pi)
    return math.exp(-(x-mu)**2 / 2 / sigma**2) / sqrt_two_pi * sigma

Data Science: Cross-Validation

For validating your model You need to split your data into a training and a test data set.

More training data means a better model, more test data means better validation.

But because the amount of data to train/test the model is limited you have to decide in which ratio of training vs test data you want to split your data.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm

iris = datasets.load_iris(),

Sample a training set while holding out 40% of the data for testing:

X_train, X_test, y_train, y_test = train_test_split(,, test_size=0.4, random_state=0)
from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf,,, cv=5)

Five Minutes with Ingo: Cross Validation


Data Science Datasets: Iris flower data set

The Iris flower data set or Fisher’s Iris data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species.

This data set can be imported from scikit-learn like the following:

from sklearn import datasets

iris = datasets.load_iris(),


Fiddling around with a Java project from my friend Thomas Berger I encountered lombok:

Lombok e.g. generates automagically setter and getter for data classes. all You have to do is annotate a class with @Data

import lombok.Data;

public class CinemaEvent {
    private String location;
    private String url;

For IntelliJ there is a lombok plugin. After activating the plugin the structure looks like this:

And there are many more convenience wrapper annotations! Try it out!

Useful Outlook Settings

Being forced to use Outlook at work, I use the following configuration:

  • Disable Popups
    • Options -> Mail -> Message Arrival
  • Default Reminder -> 5mins
    • Options -> Calendar -> Calendar options -> Default reminders
  • Activate Week Number
    • Options -> Calendar -> Display options -> Show week numbers in the month view
  • Enable Calendar in Mail View
    • Mail View -> View -> ToDo-Bar -> Calendar


KnockoutJS: passing parameters to components


<my-component params='from: "foo"'></my-component>


class MyViewModel {
    constructor(params) {

ko.components.register('table-edit', {

    viewModel: {
        createViewModel(params) {
            return new TableViewModel(params);