Python pip and virtualenv

After working for a couple of years with Python and external dependencies I’ve ran again and again into the same kind of problems.

Bad habits

Say you have a global python installation under e.g. C:\Python27 on Windows. When you start working on your first python project you want to use external packages and you encounter pip as dependency management tool. (pip is part of the python installation since 2.7.9 / 3.4) So far so good.

But you keep installing all the packages into your global python installation. Continue reading “Python pip and virtualenv”

The Normal Distribution

Diving deeper into data science I started to brush up my knowledge about math especially statistics.

The Mother of all Distributions

The normal distribution was formulated by Carl Friedrich Gauß in 18XX and can be implemented in Python like the following :

def normal_distribution(x, mu=0, sigma=1):
    sqrt_two_pi = math.sqrt(2*math.pi)
    return math.exp(-(x-mu)**2 / 2 / sigma**2) / sqrt_two_pi * sigma

Data Science: Cross-Validation

For validating your model You need to split your data into a training and a test data set.

More training data means a better model, more test data means better validation.

But because the amount of data to train/test the model is limited you have to decide in which ratio of training vs test data you want to split your data.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm

iris = datasets.load_iris()
iris.data.shape, iris.target.shape

Sample a training set while holding out 40% of the data for testing:

X_train, X_test, y_train, y_test = train_test_split( iris.data, iris.target, test_size=0.4, random_state=0)
from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)

Five Minutes with Ingo: Cross Validation

 

Distributing your own package on PyPi

In Regular Expressions Demystified I developed a little python package and distributed it via PyPi.

I wanted to publish my second self-written package as well, but coming back after almost a year, some things have changed in the world of PyPi, i.e. the old tutorials aren’t working anymore.

What You have to do is the following thing in a nutshell:

python setup.py sdist
twine upload dist/*

The complete guide can be found here.