### Lesson 3: Support Vector Machines

The term Support Vector Machines or SVM is a bit misleading. It is just a name for a very clever algorithm invented by two Russians. in the 1960s. SVMs are used for classification and regression. SVM do that by finding a hyperplane between two classes of data which separates both classes best.

### Linear Algebra with numpy

Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C. It is often used together with pandas, matplotlib and Jupyter notebooks. Often these packages are referred to as the datascience stack. Installation You can install numpy via pip pip install numpy Basic Usage In the datascience…

### Python pip and virtualenv

After working for a couple of years with Python and external dependencies I’ve ran again and again into the same kind of problems. Bad habits Say you have a global python installation under e.g. C:\Python36 on Windows. When you start working on your first python project you want to use external packages and you encounter…

### JavaScript: dot vs bracket notation

During linting my code jshint gave me the “hint” that I should prefer dot notation over bracket notation. “testcase”: data.finding[“testcase”], [‘testcase’] is better written in dot notation. What is that? Accessing members with “.” is called “dot notation”. Accessing them with [] is called “bracket notation”.

### Python Type Checking

Python is a dynamically typed language which makes it easy and fun to program. But sometimes -especially in bigger projects- it can become quite cumbersome when you just receive errors at run time. Given the hypothetical example where we define a function which multiplies integer: def multiply(a, b): return a * b print(multiply(“I”, “You”)) It…

### The Normal Distribution

Diving deeper into data science I started to brush up my knowledge about math especially statistics. The Mother of all Distributions The normal distribution was formulated by Carl Friedrich Gauß in 1809 and can be implemented in Python like the following : def normal_distribution_pdf(x, mu=0, sigma=1): sqrt_two_pi = math.sqrt(2*math.pi) return (1 / (sqrt_two_pi * sigma))…

### What is Cross-Validation in Data Science?

Motivation Cross-validation is a technique to validate the quality of your machine learning model. For validating your model you split your training data into a training and a test data set. ———————————————– | | | | training data | test data | | | | ———————————————– More training data means a better model, more test…

### Introduction to Jupyter Notebook

JuPyteR Do You know the feeling of being already late to a party when encountering something new? But when you actually start telling others about it, you realize that it is not too common knowledge at all, e.g. Jupyter Notebooks. What is a Jupyter notebook? In my own words: a browser-based document-oriented command line style…

### Data Science Datasets: Iris flower data set

Motivation When you are going to learn some data science the aquisition of data is often the first step. To get you started scikit-learn comes with a bunch of so called “toy datasets”. One of them is the Iris dataset. Prerequisites & Imports Besides scikit-learn we will use pandas for data handling and matplotlib with…

### Lombok

Fiddling around with a Java project from my friend Thomas Berger I encountered lombok: Lombok e.g. generates automagically setter and getter for data classes. all You have to do is annotate a class with @Data import lombok.Data; @Data public class CinemaEvent { private String location; private String url; } For IntelliJ there is a lombok…