What is Cross-Validation in Data Science?

Motivation Cross-validation is a technique to validate the quality of your machine learning model. For validating your model you split your training data into a training and a test data set. ———————————————– | | | | training data | test data | | | | ———————————————– More training data means a better model, more test…

Introduction to Jupyter Notebook

JuPyteR Do You know the feeling of being already late to a party when encountering something new? But when you actually start telling others about it, you realize that it is not too common knowledge at all, e.g. Jupyter Notebooks. What is a Jupyter notebook? In my own words: a browser-based document-oriented command line style…

Lombok

Fiddling around with a Java project from my friend Thomas Berger I encountered lombok: Lombok e.g. generates automagically setter and getter for data classes. all You have to do is annotate a class with @Data import lombok.Data; @Data public class CinemaEvent { private String location; private String url; } For IntelliJ there is a lombok…

pip optional dependencies

Sometimes you want to make your python package usable for different situations, e.g. flask or bottle or django. If You want to minimize dependencies You can use an optional dependency in setup.py: extras_require={ ‘flask’: [‘Flask>=0.8’, ‘blinker>=1.1’] } Now you can install the library with: pip install raven[flask]  

Add Vaadin Maven Archetype to IntelliJ

On Windows You can add the Maven archetype for Vaadin to IntelliJ IDEA the following way: Open: c:\Users\username\.IdeaIC2017.3\system\Maven\Indices\UserArchetypes.xml and add the line: <archetypes> <archetype groupId=”com.vaadin” artifactId=”vaadin-archetype-application” version=”8.3.2″ /> </archetypes Now you can create a new Vaadin project via Maven  

Numpy linspace function

To create e.g. x-axis indices you can use the linspace function from numpy. You give it a range (e.g. 0 to 23) and the number of divisions and it will distribute the values evenly across that range. The stop values is included in the resulting value array by default. Example: import numpy as np np.linspace(0,…

Data Science Overview

Questions Data Science tries to answer one of the following questions: Classification -> “Is it A or B?” Clustering -> “Are there groups which belong together?” Regression -> “How will it develop in the future?” Association -> “What is happening very often together?” There are two ways to tackle these problem domains with machine learning:…

Removing pyc files on server

Sometimes Python gives You a hard time when You deploy code to a server after you changed directory structures or simply moved files. With the following command You can remove the pyc files in the working directory and subdirectories: find . -name \*.pyc -delete