Python Archives - Creatronix

2021 – Advent of code – Day 2

Jörn — Fri, 03 Dec 2021 11:23:34 +0000

Part 1

Today the puzzle got a bit trickier than Day 1.

The submarine seems to already have a planned course (your puzzle input). You should probably figure out where it's going. For example:

    forward 5
    down 5
    forward 8
    up 3
    down 8
    forward 2

Your horizontal position and depth both start at 0. The steps above would then modify them as follows:

    forward 5 adds 5 to your horizontal position, a total of 5.
    down 5 adds 5 to your depth, resulting in a value of 5.
    forward 8 adds 8 to your horizontal position, a total of 13.
    up 3 decreases your depth by 3, resulting in a value of 2.
    down 8 adds 8 to your depth, resulting in a value of 10.
    forward 2 adds 2 to your horizontal position, a total of 15.

After following these instructions, you would have a horizontal position of 15 and a depth of 10. (Multiplying these together produces 150.)

Calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?

So Pandas here we go again:

df = pd.read_csv("./aoc_day_02_data.txt", delimiter=" ",header=None)
df.columns = ["command", "value"]

Alright, reading in the data and naming the columns are the same steps as yesterday. Now we have to columns.

	0	1
0	forward	5
1	down	5
2	forward	8
3	up	3
4	down	8
5	forward	5

horizontal = df[df['command']=="forward"]["value"].sum()

The horizontal value can be calculated with the sum function when we filter the data frame to rows where the command is “forward”

depth = df[df['command']=="down"]["value"].sum() - df[df['command']=="up"]["value"].sum()

The depth can calculated by summing up the down and up commands separately and subtract the sums from each other.

Now we have to multiply the depth and the position to get the solution

position = depth * horizontal

Part 2

    down X increases your aim by X units.
    up X decreases your aim by X units.
    forward X does two things:
        It increases your horizontal position by X units.
        It increases your depth by your aim multiplied by X.

Now, the above example does something different:

    forward 5 adds 5 to your horizontal position, a total of 5. Because your aim is 0, your depth does not change.
    down 5 adds 5 to your aim, resulting in a value of 5.
    forward 8 adds 8 to your horizontal position, a total of 13. Because your aim is 5, your depth increases by 8*5=40.
    up 3 decreases your aim by 3, resulting in a value of 2.
    down 8 adds 8 to your aim, resulting in a value of 10.
    forward 2 adds 2 to your horizontal position, a total of 15. Because your aim is 10, your depth increases by 2*10=20 to a total of 60.

After following these new instructions, you would have a horizontal position of 15 and a depth of 60. 
(Multiplying these produces 900.)

Using this new interpretation of the commands, calculate the horizontal position and depth you would have after following the planned course. 
What do you get if you multiply your final horizontal position by your final depth?

To get an overview I simplified the table

Here I had a hard time to do it with pandas so vanilla python to the rescue:

if __name__ == '__main__':

    with open("./aoc_day_02_data.txt") as file:
        lines = file.readlines()
        lines = [line.rstrip() for line in lines]

    horizontal = 0
    current_aim = 0
    depth = 0
    for line in lines:
        print(line)
        command, value = line.split(" ")
        value = int(value)
        if command == "forward":
            horizontal += value
            depth += value * current_aim
        if command == "down":
            current_aim += value
        if command == "up":
            current_aim += value * -1

    print(f"horizontal: {horizontal}")
    print(f"depth: {depth}")
    print(horizontal * depth)

Update

I’ve figured out how to do it with Pandas as well

import pandas as pd

df = pd.read_csv("./aoc_day_02_test_data.txt", delimiter=" ",header=None)
df.columns = ["command", "value"]

horizontal = df[df['command']=="forward"]["value"].sum()

df.loc[df['command']=="up", "value"] = df[df['command']=="up"].mul(-1)
df["aim"] = 0

df.loc[df['command']!="forward", "aim"] = df[df['command']!="forward"]["value"]
df["current_aim"] = df["aim"].cumsum()

df.loc[df['command']=="forward", "depth"] = df[df['command']=="forward"]["value"] * df[df['command']=="forward"]["current_aim"]
depth = df["depth"].sum()

The post 2021 – Advent of code – Day 2 appeared first on Creatronix.

2021 – Advent of code – Day 1

Jörn — Thu, 02 Dec 2021 09:30:22 +0000

I’ve haven’t participated in the advent of code before. But always been curious.

What is advent of code?

It’s an advent Calendar for programmers. You get 25 challenges starting December 1st. Caveat: you have to solve the challenge to be eligible for the next day’s challenge 🙂

Day 1 Challenge – Part 1

On the first day your first task is to count how many times a value is bigger than its predecessor. They give us some sample data

199 N/A
200 bigger
208 bigger
210 bigger
200 smaller
207 bigger
240 bigger
269 bigger
260 smaller
263 bigger

When we count the times a value is bigger we get seven times bigger.

The actual data contains 2000 rows. This isn’t exactly big data but I’ve wanted to dust off my Pandas skill, so here we go:

Let’s look at the data

import pandas as pd

df = pd.read_csv("./aoc_day_01_data.txt", header=None)
df.describe

With the read_csv() function we can read in our data file and convert it into a data frame. It’s important to hand over the header=None. Otherwise pandas assumes the first row is a column header.

df.describe gives us:

Because we want to reference the columns by name we add a column header

df.columns = ["original"]

To compare the nth cell with its n+1th cell neighbour be add a new column but shift the values

df['shifted'] = df['original'].shift(-1)

The output looks like this:

	original	shifted
0	159	158.0
1	158	174.0
2	174	196.0
3	196	197.0
4	197	194.0
…	…	…
1995	8538	8543.0
1996	8543	8545.0
1997	8545	8557.0
1998	8557	8568.0
1999	8568	NaN

We add another column where we place the value True when the value from the current row in the shifted column is bigger than in the original column:

df['increased'] = (df['shifted'] > df['original'])

Now it starts to look like the sample data from the introduction:

	original	shifted	increased
0	159	158.0	False
1	158	174.0	True
2	174	196.0	True
3	196	197.0	True
4	197	194.0	False
…	…	…	…
1995	8538	8543.0	True
1996	8543	8545.0	True
1997	8545	8557.0	True
1998	8557	8568.0	True
1999	8568	NaN	False

the last thing we have to do is counting how many times True occurs:

true_count = df['increased'].sum()

which gives us “1583”

This is a bit of a hack because it assumes that True equals 1 and False == 0

A more elegant solution is to use value_counts:

df['increased'].value_counts(dropna=False)

No the output is:

True     1583
False     417
Name: increased, dtype: int64

And 1583 is the number we are looking for. This earned us our first golden star and unlocked the second part of the challenge:

Part 2

The second part is a bit more challenging because we have to sum up three adjacent values and compare them to the next three values.

199  A       
200  A B     
208  A B C   
210    B C D
200  E   C D
207  E F   D
240  E F G
269    F G H
260      G H
263        H

I created a new notebook and started like part 1 with reading the data and naming the first column

import pandas as pd

df = pd.read_csv("./aoc_day_01_data.txt", header=None)
df.columns = ["original"]

To add the sum of three values to the row of the first value we use the following code

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()

This demonstrates the power of Pandas once more: you have integrated sliding window functions!

The rest is equal to part one “shift, compare and count”

df['shifted_rs'] = df['rolling_sum'].shift(-1)
df['increased_rs'] = (df['shifted_rs'] > df['rolling_sum'])
true_count = df['increased_rs'].sum()
true_count

As a little Fingerübung I did the same with vanilla Python:

data = []
with open("./aoc_day_01_test_data.txt") as f:
    for line in f:
        data.append(int(line.rstrip()))

triplet_sums = []

for i, v in enumerate(data):
    if i < (len(data) - 2):
        triplet_sum = data[i] + data[i+1] + data[i+2]
        triplet_sums.append(triplet_sum)
print(triplet_sums)

sums_larger_than_previous_sums = 0
for i, v in enumerate(triplet_sums):
    if i < (len(triplet_sums) - 1):
        if triplet_sums[i] < triplet_sums[i+1]:
            sums_larger_than_previous_sums += 1

print(sums_larger_than_previous_sums)

Which works but is less elegant.

Stay tuned for more!

The post 2021 – Advent of code – Day 1 appeared first on Creatronix.

Python Argparse

Jörn — Fri, 19 Mar 2021 12:55:14 +0000

Primer: Arguments vs Parameters

I’m sometimes confused. Is it an argument or a parameter? So here it goes:

A parameter is a variable in a function definition.

When a function is called, the arguments are the data you pass into the function’s parameters.

The same goes for a program: A program has parameters, you call a program with arguments.

Argparse Module

Batteries included! The argparse module provides a nice way to parse your program arguments:

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('positional_param')
    parser.add_argument('-o', '--optional_param')
    parser.add_argument('-r', '--required_param', required=True)
    args = parser.parse_args()
    print(args.positional_param)
    print(args.optional_param)
    print(args.required_param)

Positional Parameters

Positional arguments are useful for shorter programs. These parameters do not have a name. You could call them “unnamed parameters” as well.

It is important that you keep the right sequence when calling the program:

parser = argparse.ArgumentParser()
parser.add_argument('positional_param_one')
parser.add_argument('positional_param_two')
args = parser.parse_args()
print(args.positional_param_one)
print(args.positional_param_two)

$ python positional_args.py foo bar
foo
bar

Optional Parameters

Optional parameters have a name you have to use when providing an argument to the program

parser.add_argument('--optional_param')

The double dash indicates an optional parameter

You can provide a shorthand for an optional parameter as well

parser.add_argument('-o', '--optional_param')

When you leave out an optional argument, the value will be defaulted to None

Two ways of calling a program with optional arguments:

$python optional_args.py
None

$python optional_args.py --optional_param foobar
foobar

Required “Optional” Parameters

Now it gets a bit weird: You can make an optional parameter required.

(another strong argument that these should be named “named parameters”)

parser.add_argument('-r', '--required_param', required=True)

If you don not provide the argument you will get:

$ python required_params.py
usage: required_params.py [-h] -r REQUIRED_PARAM
required_params.py: error: the following arguments are required: -r/--required_param

So better provide the parameter:

$ python required_params.py -r foobar
foobar

Command Line Help

When you start your script with the -h or –help parameter, you will get a nice overview of the usage

$ python main.py -h
usage: main.py [-h] [-o OPTIONAL_PARAM] -r REQUIRED_PARAM positional_param

positional arguments:
positional_param

optional arguments:
-h, --help show this help message and exit
-o OPTIONAL_PARAM, ---optional_param OPTIONAL_PARAM
-r REQUIRED_PARAM, --required_param REQUIRED_PARAM

The post Python Argparse appeared first on Creatronix.

Pandas Cheat Sheet

Jörn — Fri, 05 Mar 2021 13:17:19 +0000

If you are new to Pandas feel free to read Introduction to Pandas

I’ve assembled some pandas code snippets

Reading Data

Reading CSV

import pandas as pd

# read from csv
df = pd.read_csv("path_to_file")

Can also be textfiles. file suffix is ignored

The default limiter for comma separated value files is the comma. If you have data with another delimiter you can specify it via:

delimiter=";"

If your data has no header you can pass header=None into the function

df = pd.read_csv("./aoc_day_01_data.txt", header=None)

With skiprows you can start reading in at any row

skiprows=8

Sometimes you need to alter the encoding as well:

encoding="cp1252"

Reading Excel

You can read excel files as well but you need to install

pip install openpyxl

df = pd.read_excel("./my_excel_sheet.xlsx")

With sheet_name you can select the individual sheet:

sheet_name="my_sheet_1"

Inspecting data

Basic information

df.describe()

Length

len(df)

showing entries

df.head()

df.tail(10)

Indexing

df['A']

gives you column A

iloc gives you entries based on numerical index

#      [row, column]
df.iloc[0,   0]

#     [row, column]
df.loc[:, :]

Data Cleaning

Dropping columns

del df["column_name"]

Renaming columns

df.columns = ["new_column_name", ...]

Comparing columns

df['increased'] = (df['shifted'] > df['original'])

Shifting columns

df['shifted'] = df['original'].shift(-1)

Splitting

Splitting strings into individual columns

df = pd.DataFrame(df["original"].str.split('').tolist())

Counting and Calculating

Summing columns

df["value"].sum()

Cumulative sum

df["aim"].cumsum()

Rolling sum

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()

Counting value occurence

df['increased'].value_counts(dropna=False)

Counting occurrences for all columns

df = pd.concat([df[column].value_counts() for column in df], axis = 1)

Convert column to datetime

df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])

Convert datetime to minutes since midnight

df_train.loc[:, 'msm'] = df_train.loc[:, "date"].dt.hour * 60 + df_train.loc[:, "date"].dt.minute

The post Pandas Cheat Sheet appeared first on Creatronix.

SQLite3: Python and SQL

Jörn — Thu, 02 Apr 2020 08:57:45 +0000

Everything we did in the last articles of the series SQL-Tutorial was a dry run because we just used SQLFiddle.

So let’s start with a real database like SQLite.

SQLite is a file based DBRMS and can be used for e.g. web sites. The official docs say:

“SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). [..] Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite.”

Because Knight Industries is not Google, Amazon nor Facebook we can definitely use SQLite.

Creating and connecting to a database

In Python it is pretty easy to connect to a SQLite database:

from sqlite3 import connect 
db_connection = connect('knight_industries.db')

If the file knight_industries.db does not exist, it will be created automagically. A nice little feature of the sqlite3 library.

But be careful: If You already have a database file and you mess up the path in the connect statement you will wonder why you cannot access your data, because a new file is created silently.

cursor = db_connection.cursor()
cursor.execute('''CREATE TABLE operatives (id INTEGER, name TEXT, birthday DATE)''')
cursor.execute('''INSERT INTO operatives (id, name, birthday) \
                  VALUES (1, "Michael Arthur Long", "1949-01-09")''')

db_connection.commit()
cursor.execute('''SELECT * FROM operatives''')
print cursor.fetchone()
db_connection.close()

The post SQLite3: Python and SQL appeared first on Creatronix.

Intro to OpenCV with Python

Jörn — Mon, 23 Jul 2018 19:11:18 +0000

Installation

To work with OpenCV from python, you need to install it first. We additionally install numpy and matplotlib as well

pip install opencv-python numpy matplotlib

Reading Images from file

After we import cv2 we can directly work with images like so:

import cv2 
img = cv2.imread("doc_brown.png")

For showing the image, it is recommended to use matplotlib

import matplotlib.pyplot as plt 
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
plt.imshow(img) 
plt.show()

OpenCV stores images internally in the BGR format – blue – green – red so we have to convert to RGB before displaying them.

Image Shape

We have now an image object which can already tell us more about the image itself:

print(img.shape) 
(205, 236, 3)

The tuple shows us the number of (rows, columns, channels)

Manipulate brightness

import numpy as np 
brightness = np.zeros(img.shape, dtype="uint8") + 30 
img = cv2.add(img, brightness)

Manipulating brightness works like this: you need a numpy array with the size of the image, add your brightness and add the brightness array to the original image.

Adding noise

Adding noise works the same way, but instead of adding a fixed value to every pixel you add normal distributed values.

noise = np.random.randint(0, 100, size=img.shape, dtype="uint8") 
img = cv2.add(img, noise)

Smoothing

Smoothing is reducing the noise of an image by adding a Gaussian blur:

img = cv2.GaussianBlur(img, ksize=(31, 31), sigmaX=5)

Gray-scale conversion

Last but not least we can convert an image to a gray scale. Beware that for showing / storing the image you need to add the flag cmap=”gray”

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
plt.imshow(img, cmap='gray')

Have fun fiddling around with OpenCV!

Github Repo

https://github.com/jboegeholz/introduction_to_opencv

The post Intro to OpenCV with Python appeared first on Creatronix.

Linear Algebra with numpy

Jörn — Fri, 04 May 2018 11:42:22 +0000

Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C.

It is often used together with pandas, matplotlib and Jupyter notebooks. Often these packages are referred to as the datascience stack.

Installation

You can install numpy via pip

pip install numpy

Basic Usage

In the datascience world numpy is often imported like this:

import numpy as np

The “as” keyword defines a so called alias. Now you can use structures from numpy by referencing them with “np” instaed of the whole name.

Think “abbreviation”.

n-dimensional array

The most important data structure is ndarray, which is short for n-dimensional array.

You can convert a list to an numpy array with the array-method

my_list = [1, 2, 3, 4] 
my_array = np.array(my_list)

You can also convert an array back to a list with

my_new_list = my_array.tolist()

You can retrieve the dimensionality of an array with the ndim property:

my_array.ndim

and get the number of data points with the shape property

my_array.shape

Vector arithmetic

Addition / Subtraction

a = np.array([1, 2, 3, 4]) 
b = np.array([4, 3, 2, 1]) 
a + b 
array([5, 5, 5, 5]) 

a - b 
array([-3, -1, 1, 3])

Scalar Multiplication

a = np.array([1, 2, 3, 4]) 
a * 3 

array([3, 6, 9, 12])

To see why it is charming to use numpy’s array for this operation You have to consider the alternative:

c = [1,2,3,4] 
d = [x * 3 for x in c]

Dot Product

a = np.array([1,2,3,4]) 
b = np.array([4,3,2,1]) 
a.dot(b) 

20 # 1*3 + 2*3 + 3*2 + 4*1

Learn more about numpy:

numpy random choice

Numpy linspace function

Project on github

The post Linear Algebra with numpy appeared first on Creatronix.

pip optional dependencies

Jörn — Mon, 16 Apr 2018 12:11:48 +0000

Sometimes you want to make your python package usable for different situations, e.g. flask or bottle or django.

If You want to minimize dependencies You can use an optional dependency in setup.py:

extras_require={ 
    'flask': ['Flask>=0.8', 'blinker>=1.1'] 
}

Now you can install the library with:

pip install raven[flask]

The post pip optional dependencies appeared first on Creatronix.

My personal roadmap for learning data science in 2018

Jörn — Wed, 13 Dec 2017 14:05:14 +0000

I got confused by all the buzzwords: data science, machine learning, deep learning, neural nets, artificial intelligence, big data, and so on and so on.

As an engineer I like to put some structure to the chaos. Inspired by Roadmap: How to Learn Machine Learning in 6 Months and Tetiana Ivanova – How to become a Data Scientist in 6 months a hacker’s approach to career planning I build my own learning road map for this year:
So 2018 will be all about Data Science. Hearing about the Personal Knowledge Mastery concept at SWEC17 I am going to tackle the learning process on different levels.

Watch the Pros

Thanks to open course ware there are a ton of awesome university courses online e.g.:

MIT 6.0002 Introduction to Computational Thinking and Data Science

Learn the tools

There is already a whole bunch of tools we can consider belonging to a standard data science stack. Because my main language is Python the focus is of course on mostly python modules.

Finishing Udacity / Udemy courses

To brush up my python skills and my knowledge of basic computer science I will finish some already started online courses:

- [ ] Introduction to Machine Learning
- [ ] Python Bootcamp
- [ ] Algorithms and Data Structures
- [ ] Introduction to Artificial Intelligence
- [ ] Introduction to computer vision
- [ ] Artificial Intelligence for Robotics

Reading data science books

To get a broad overview I bought two books on DS / ML

[ ] Data Science from Scratch
[ ] Hands on Machine Learning

Do Exercises on Kaggle

[x] Create Account at Kaggle
[ ] Do first exercise
[ ] Participate in a contest

Visit Meetups about Data Science

[ ] Visit Big Data Meetup Events

Add some Peer Pressure

My brother in law and I teemed up and build a Whatsapp learn & exchange group. We are currently four members.

Write Blog Articles

I will try to incorporate some of the stuff I’ve learned into blog articles.

I already did

So stay tuned!

The post My personal roadmap for learning data science in 2018 appeared first on Creatronix.

Regular Expressions Demystified – A Mini DSL for Regex in Python

Jörn — Mon, 05 Jun 2017 12:36:31 +0000

Motivation

Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already.

Because I’ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I’ve started a little project to ease up the use of RegEx.

What are Regular Expressions aka RegEx?

RegEx are a sequence of characters which help you to search patterns in text.

Say you have an input string which contains whitespaces, tabs and line break:

input_string = " \tJoernBoegeholz \n"

You will certainly agree that it won’t be a good idea to use this string as e.g. a username. If a username is necessary to login into a system, a user will not remember if he accidentially typed a whitespace character in to form field.. So we have to replace the whitespaces, tabs and linebreak.

output = input_string.replace(" ", "") 
output = output.replace("\t", "") 
output = output.replace("\n", "")

This is a bit messy, with RegEx we can use the “\s” Metacharacter

output = re.sub("\s", "", input_string)

From the Python Doc:

“When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v].”

Please take this just as an example, in production code You would use “strip()” to remove leading and trailing whitespaces.

OK, here is the catch: I cannot remember the meta-characters. That makes working with RegEx cumbersome for me.

First step

All meta-characters are represented as a constant.

ANY_CHAR = '.' 
DIGIT = '\d' 
NON_DIGIT = '\D' 
WHITESPACE = '\s' 
NON_WHITESPACE = '\S' 
ALPHA = '[a-zA-Z]' 
ALPHANUM = '\w' 
NON_ALPHANUM = '\W'

Second Step

We wrap the multiplier in convenience methods.

def zero_or_more(string): 
    return string + '*' d

ef zero_or_once(string): 
    return string + '?' 

def one_or_more(string): 
    return string + '+'

Third Step

As syntactic sugar we introduce a class which encapsulates the pattern:

class Pattern:

    def __init__(self):
        self.pattern = ''

    def starts_with(self, start_str):
        self.pattern += start_str
        return self

    def followed_by(self, next_string):
        self.pattern += next_string
        return self

    def __str__(self):
        return self.pattern

    def __repr__(self):
        return self._regex/code>

Result

Instead of writing

pattern = "\d\D+\s{2,4}"

you can now write

pattern = Pattern()
pattern.starts_with(DIGIT)\
    .followed_by(one_or_more(NON_DIGIT))\
    .followed_by(between(2, 4, WHITESPACE))

which is more human readable.

My first PyPI package

After using

pip install

for a couple of years, I wanted to know how I can upload a new package to PyPI or the “Python Package Index”, so I’ve written another tutorial:

Distributing your own package on PyPi

At the moment it’s a pet project, but if you are interested You can use the code via

pip install easy_pattern.

Links

PyPi

Github

The post Regular Expressions Demystified – A Mini DSL for Regex in Python appeared first on Creatronix.