2021 – Advent of code – Day 3

Part 1 You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the gamma rate and the epsilon rate). The power consumption can then be found by multiplying the gamma rate by the epsilon rate. Each bit in the gamma rate can be determined by finding the…

2021 – Advent of code – Day 2

Part 1 Today the puzzle got a bit trickier than Day 1. The submarine seems to already have a planned course (your puzzle input). You should probably figure out where it’s going. For example: forward 5 down 5 forward 8 up 3 down 8 forward 2 Your horizontal position and depth both start at 0.…

New Blog Post

k-fold crossvalidation with sklearn

from sklearn.model_selection import KFold kf = KFold(n_splits=2) kf.split(df_train) step = 0 # set counter to 0 for train_index, val_index in kf.split(df_train): # for each fold step = step + 1 # update counter print(‘Step ‘, step) features_fold_train = df_train.iloc[train_index, [4, 5]] # features matrix of training data (of this step) features_fold_val = df_train.iloc[val_index, [4, 5]]…

Pandas Cheat Sheet

If you are new to Pandas feel free to read Introduction to Pandas I’ve assembled some pandas code snippets Reading Data Reading CSV import pandas as pd # read from csv df = pd.read_csv(“path_to_file”) Can also be textfiles. file suffix is ignored

Data Science Pipeline

Motivation Learning Data Science can be grueling and overwhelming sometimes. When I feel too overwhelmed it’s time to draw a picture. This my current overview of what a data scientist has to do: General tools Linear Algebra with numpy – Part 1 numpy random choice Numpy linspace function Data acquisiton Data Science Datasets: Iris flower…

New Blog Post

Introduction to matplotlib – Part 3

  After laying the foundation in Introduction to matplotlib and Introduction to matplotlib – Part 2 I want to show you another important chart Bar Charts A bar chart is useful to show total values over time e.g. the revenue of a company. years = (2017, 2018, 2019) revenue = (5000, 7000, 9000) plt.bar(years, revenue, width=0.35)…

New Blog Post

Linear Regression with sklearn – cheat sheet

# import and instantiate model from sklearn.linear_model import LinearRegression model = LinearRegression() #prepare test data features_train = df_train.loc[:, ‘feature_name’] target_train = df_train.loc[:, ‘target_name’] #fit (train) model and print coefficient and intercept model.fit(features_train , target_train ) print(model.coef_) print(model.intercept_) # calculate model quality from sklearn.metrics import mean_squared_error from sklearn.metrics import r2_score target_prediction = model.predict(features_train) print(mean_squared_error(target_train , target_prediction))…

Introduction to matplotlib – Part 2

When you finished reading part 1 of the introduction you might have wondered how to draw more than one line or curve into on plot. I will show you now. To make it a bit more interesting we generate two functions: sine and cosine. We generate our x-values with numpy’s linspace function import numpy as…