<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Python Archives - Creatronix</title>
	<atom:link href="https://creatronix.de/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>https://creatronix.de/tag/python/</link>
	<description>My adventures in code &#38; business</description>
	<lastBuildDate>Tue, 23 Dec 2025 08:33:29 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>2021 – Advent of code – Day 2</title>
		<link>https://creatronix.de/2021-advent-of-code-day-2/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Fri, 03 Dec 2021 11:23:34 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[advent of code]]></category>
		<category><![CDATA[day 2]]></category>
		<category><![CDATA[pandas]]></category>
		<guid isPermaLink="false">https://creatronix.de/?p=4296</guid>

					<description><![CDATA[<p>Part 1 Today the puzzle got a bit trickier than Day 1. The submarine seems to already have a planned course (your puzzle input). You should probably figure out where it's going. For example: forward 5 down 5 forward 8 up 3 down 8 forward 2 Your horizontal position and depth both start at 0.&#8230;</p>
<p>The post <a href="https://creatronix.de/2021-advent-of-code-day-2/">2021 – Advent of code – Day 2</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Part 1</h2>
<p>Today the puzzle got a bit trickier than <a href="https://creatronix.de/2021-advent-of-code-day-1/">Day 1</a>.</p>
<pre>The submarine seems to already have a planned course (your puzzle input). You should probably figure out where it's going. For example:

    forward 5
    down 5
    forward 8
    up 3
    down 8
    forward 2

Your horizontal position and depth both start at 0. The steps above would then modify them as follows:

    forward 5 adds 5 to your horizontal position, a total of 5.
    down 5 adds 5 to your depth, resulting in a value of 5.
    forward 8 adds 8 to your horizontal position, a total of 13.
    up 3 decreases your depth by 3, resulting in a value of 2.
    down 8 adds 8 to your depth, resulting in a value of 10.
    forward 2 adds 2 to your horizontal position, a total of 15.

After following these instructions, you would have a horizontal position of 15 and a depth of 10. (Multiplying these together produces 150.)

Calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?</pre>
<p>So Pandas here we go again:<span id="more-4296"></span></p>
<pre>df = pd.read_csv("./aoc_day_02_data.txt", delimiter=" ",header=None)
df.columns = ["command", "value"]</pre>
<p>Alright, reading in the data and naming the columns are the same steps as yesterday. Now we have to columns.</p>
<table class="dataframe" border="1">
<thead>
<tr>
<th></th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>forward</td>
<td>5</td>
</tr>
<tr>
<th>1</th>
<td>down</td>
<td>5</td>
</tr>
<tr>
<th>2</th>
<td>forward</td>
<td>8</td>
</tr>
<tr>
<th>3</th>
<td>up</td>
<td>3</td>
</tr>
<tr>
<th>4</th>
<td>down</td>
<td>8</td>
</tr>
<tr>
<th>5</th>
<td>forward</td>
<td>5</td>
</tr>
</tbody>
</table>
<pre>horizontal = df[df['command']=="forward"]["value"].sum()</pre>
<p>The horizontal value can be calculated with the sum function when we filter the data frame to rows where the command is &#8220;forward&#8221;</p>
<p>&nbsp;</p>
<pre>depth = df[df['command']=="down"]["value"].sum() - df[df['command']=="up"]["value"].sum()</pre>
<p>The depth can calculated by summing up the down and up commands separately and subtract the sums from each other.</p>
<p>Now we have to multiply the depth and the position to get the solution</p>
<pre>position = depth * horizontal</pre>
<h2>Part 2</h2>
<pre>    down X increases your aim by X units.
    up X decreases your aim by X units.
    forward X does two things:
        It increases your horizontal position by X units.
        It increases your depth by your aim multiplied by X.

Now, the above example does something different:

    forward 5 adds 5 to your horizontal position, a total of 5. Because your aim is 0, your depth does not change.
    down 5 adds 5 to your aim, resulting in a value of 5.
    forward 8 adds 8 to your horizontal position, a total of 13. Because your aim is 5, your depth increases by 8*5=40.
    up 3 decreases your aim by 3, resulting in a value of 2.
    down 8 adds 8 to your aim, resulting in a value of 10.
    forward 2 adds 2 to your horizontal position, a total of 15. Because your aim is 10, your depth increases by 2*10=20 to a total of 60.

After following these new instructions, you would have a horizontal position of 15 and a depth of 60. 
(Multiplying these produces 900.)

Using this new interpretation of the commands, calculate the horizontal position and depth you would have after following the planned course. 
What do you get if you multiply your final horizontal position by your final depth?</pre>
<p>To get an overview I simplified the table</p>
<pre>     a   d
f 5  0   0
d 5  5
f 8  5  40
u 3  2
d 8 10
f 2 10  20</pre>
<p>Here I had a hard time to do it with pandas so vanilla python to the rescue:</p>
<pre>if __name__ == '__main__':

    with open("./aoc_day_02_data.txt") as file:
        lines = file.readlines()
        lines = [line.rstrip() for line in lines]

    horizontal = 0
    current_aim = 0
    depth = 0
    for line in lines:
        print(line)
        command, value = line.split(" ")
        value = int(value)
        if command == "forward":
            horizontal += value
            depth += value * current_aim
        if command == "down":
            current_aim += value
        if command == "up":
            current_aim += value * -1

    print(f"horizontal: {horizontal}")
    print(f"depth: {depth}")
    print(horizontal * depth)</pre>
<h2>Update</h2>
<p>I&#8217;ve figured out how to do it with Pandas as well</p>
<pre>import pandas as pd

df = pd.read_csv("./aoc_day_02_test_data.txt", delimiter=" ",header=None)
df.columns = ["command", "value"]

horizontal = df[df['command']=="forward"]["value"].sum()

df.loc[df['command']=="up", "value"] = df[df['command']=="up"].mul(-1)
df["aim"] = 0

df.loc[df['command']!="forward", "aim"] = df[df['command']!="forward"]["value"]
df["current_aim"] = df["aim"].cumsum()

df.loc[df['command']=="forward", "depth"] = df[df['command']=="forward"]["value"] * df[df['command']=="forward"]["current_aim"]
depth = df["depth"].sum()</pre>
<p>The post <a href="https://creatronix.de/2021-advent-of-code-day-2/">2021 – Advent of code – Day 2</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>2021 &#8211; Advent of code &#8211; Day 1</title>
		<link>https://creatronix.de/2021-advent-of-code-day-1/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Thu, 02 Dec 2021 09:30:22 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[advent of code]]></category>
		<category><![CDATA[challenge]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[jupyter]]></category>
		<category><![CDATA[pandas]]></category>
		<guid isPermaLink="false">https://creatronix.de/?p=4285</guid>

					<description><![CDATA[<p>I&#8217;ve haven&#8217;t participated in the advent of code before. But always been curious. What is advent of code? It&#8217;s an advent Calendar for programmers. You get 25 challenges starting December 1st. Caveat: you have to solve the challenge to be eligible for the next day&#8217;s challenge 🙂 Day 1 Challenge &#8211; Part 1 On the&#8230;</p>
<p>The post <a href="https://creatronix.de/2021-advent-of-code-day-1/">2021 &#8211; Advent of code &#8211; Day 1</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>I&#8217;ve haven&#8217;t participated in the <a href="https://adventofcode.com/2021/day/1">advent of code</a> before. But always been curious.</p>
<h2>What is advent of code?</h2>
<p>It&#8217;s an <span class="Y2IQFc" lang="en">advent Calendar for programmers. You get 25 challenges starting December 1st. Caveat: you have to solve the challenge to be eligible for the next day&#8217;s challenge 🙂<br />
</span><span id="more-4285"></span></p>
<h2>Day 1 Challenge &#8211; Part 1</h2>
<p>On the first day your first task is to count how many times a value is bigger than its predecessor. They give us some sample data</p>
<pre>199 N/A
200 <strong>bigger</strong>
208 <strong>bigger</strong>
210 <strong>bigger</strong>
200 smaller
207 <strong>bigger</strong>
240 <strong>bigger</strong>
269 <strong>bigger</strong>
260 smaller
263 <strong>bigger</strong></pre>
<p>When we count the times a value is bigger we get seven times bigger.</p>
<p>The actual data contains 2000 rows. This isn&#8217;t exactly big data but I&#8217;ve wanted to dust off my Pandas skill, so here we go:</p>
<p>Let&#8217;s look at the data</p>
<pre>import pandas as pd

df = pd.read_csv("./aoc_day_01_data.txt", header=None)
df.describe</pre>
<p>With the read_csv() function we can read in our data file and convert it into a data frame. It&#8217;s important to hand over the header=None. Otherwise pandas assumes the first row is a column header.</p>
<p>df.describe gives us:</p>
<pre class="console_text">&lt;bound method NDFrame.describe of          0
0      159
1      158
2      174
3      196
4      197
...    ...
1995  8538
1996  8543
1997  8545
1998  8557
1999  8568

[2000 rows x 1 columns]&gt;</pre>
<p>Because we want to reference the columns by name we add a column header</p>
<pre>df.columns = ["original"]</pre>
<p>To compare the nth cell with its n+1th cell neighbour be add a new column but shift the values</p>
<pre>df['shifted'] = df['original'].shift(-1)</pre>
<p>The output looks like this:</p>
<table class="dataframe" border="1">
<thead>
<tr>
<th></th>
<th>original</th>
<th>shifted</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>159</td>
<td><strong>158</strong>.0</td>
</tr>
<tr>
<th>1</th>
<td><strong>158</strong></td>
<td>174.0</td>
</tr>
<tr>
<th>2</th>
<td>174</td>
<td>196.0</td>
</tr>
<tr>
<th>3</th>
<td>196</td>
<td>197.0</td>
</tr>
<tr>
<th>4</th>
<td>197</td>
<td>194.0</td>
</tr>
<tr>
<th>&#8230;</th>
<td>&#8230;</td>
<td>&#8230;</td>
</tr>
<tr>
<th>1995</th>
<td>8538</td>
<td>8543.0</td>
</tr>
<tr>
<th>1996</th>
<td>8543</td>
<td>8545.0</td>
</tr>
<tr>
<th>1997</th>
<td>8545</td>
<td>8557.0</td>
</tr>
<tr>
<th>1998</th>
<td>8557</td>
<td>8568.0</td>
</tr>
<tr>
<th>1999</th>
<td>8568</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>We add another column where we place the value True when the value from the current row in the shifted column is bigger than in the original column:</p>
<pre>df['increased'] = (df['shifted'] &gt; df['original'])</pre>
<p>Now it starts to look like the sample data from the introduction:</p>
<table class="dataframe" border="1">
<thead>
<tr>
<th></th>
<th>original</th>
<th>shifted</th>
<th>increased</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>159</td>
<td>158.0</td>
<td>False</td>
</tr>
<tr>
<th>1</th>
<td>158</td>
<td>174.0</td>
<td>True</td>
</tr>
<tr>
<th>2</th>
<td>174</td>
<td>196.0</td>
<td>True</td>
</tr>
<tr>
<th>3</th>
<td>196</td>
<td>197.0</td>
<td>True</td>
</tr>
<tr>
<th>4</th>
<td>197</td>
<td>194.0</td>
<td>False</td>
</tr>
<tr>
<th>&#8230;</th>
<td>&#8230;</td>
<td>&#8230;</td>
<td>&#8230;</td>
</tr>
<tr>
<th>1995</th>
<td>8538</td>
<td>8543.0</td>
<td>True</td>
</tr>
<tr>
<th>1996</th>
<td>8543</td>
<td>8545.0</td>
<td>True</td>
</tr>
<tr>
<th>1997</th>
<td>8545</td>
<td>8557.0</td>
<td>True</td>
</tr>
<tr>
<th>1998</th>
<td>8557</td>
<td>8568.0</td>
<td>True</td>
</tr>
<tr>
<th>1999</th>
<td>8568</td>
<td>NaN</td>
<td>False</td>
</tr>
</tbody>
</table>
<p>the last thing we have to do is counting how many times True occurs:</p>
<pre>true_count = df['increased'].sum()</pre>
<p>which gives us &#8220;1583&#8221;</p>
<p>This is a bit of a hack because it assumes that True equals 1 and False == 0</p>
<p>A more elegant solution is to use value_counts:</p>
<pre>df['increased'].value_counts(dropna=False)</pre>
<p>No the output is:</p>
<pre class="console_text">True     1583
False     417
Name: increased, dtype: int64</pre>
<p>And 1583 is the number we are looking for. This earned us our first golden star and unlocked the second part of the challenge:</p>
<h2>Part 2</h2>
<p>The second part is a bit more challenging because we have to sum up three adjacent values and compare them to the next three values.</p>
<pre>199  A       
200  A B     
208  A B C   
210    B C D
200  E   C D
207  E F   D
240  E F G
269    F G H
260      G H
263        H</pre>
<p>I created a new notebook and started like part 1 with reading the data and naming the first column</p>
<pre>import pandas as pd

df = pd.read_csv("./aoc_day_01_data.txt", header=None)
df.columns = ["original"]</pre>
<p>To add the sum of three values to the row of the first value we use the following code</p>
<pre>indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()</pre>
<p>This demonstrates the power of Pandas once more: you have integrated sliding window functions!</p>
<p>The rest is equal to part one &#8220;shift, compare and count&#8221;</p>
<pre>df['shifted_rs'] = df['rolling_sum'].shift(-1)
df['increased_rs'] = (df['shifted_rs'] &gt; df['rolling_sum'])
true_count = df['increased_rs'].sum()
true_count</pre>
<p>As a little Fingerübung I did the same with vanilla Python:</p>
<pre>data = []
with open("./aoc_day_01_test_data.txt") as f:
    for line in f:
        data.append(int(line.rstrip()))

triplet_sums = []

for i, v in enumerate(data):
    if i &lt; (len(data) - 2):
        triplet_sum = data[i] + data[i+1] + data[i+2]
        triplet_sums.append(triplet_sum)
print(triplet_sums)

sums_larger_than_previous_sums = 0
for i, v in enumerate(triplet_sums):
    if i &lt; (len(triplet_sums) - 1):
        if triplet_sums[i] &lt; triplet_sums[i+1]:
            sums_larger_than_previous_sums += 1

print(sums_larger_than_previous_sums)</pre>
<p>Which works but is less elegant.</p>
<p>Stay tuned for more!</p>
<p>The post <a href="https://creatronix.de/2021-advent-of-code-day-1/">2021 &#8211; Advent of code &#8211; Day 1</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Python Argparse</title>
		<link>https://creatronix.de/python-argparse/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Fri, 19 Mar 2021 12:55:14 +0000</pubDate>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[argparse]]></category>
		<category><![CDATA[Command Line Help]]></category>
		<category><![CDATA[Required "Optional" Parameters]]></category>
		<guid isPermaLink="false">https://creatronix.de/?p=3527</guid>

					<description><![CDATA[<p>Primer: Arguments vs Parameters I&#8217;m sometimes confused. Is it an argument or a parameter? So here it goes: A parameter is a variable in a function definition. When a function is called, the arguments are the data you pass into the function&#8217;s parameters. The same goes for a program: A program has parameters, you call&#8230;</p>
<p>The post <a href="https://creatronix.de/python-argparse/">Python Argparse</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Primer: Arguments vs Parameters</h2>
<p>I&#8217;m sometimes confused. Is it an argument or a parameter? So here it goes:</p>
<blockquote><p>A parameter is a variable in a function definition.</p>
<p>When a function is called, the arguments are the data you pass into the function&#8217;s parameters.</p></blockquote>
<p>The same goes for a program: A program has parameters, you <strong>call</strong> a program with arguments.</p>
<h2>Argparse Module</h2>
<p><img fetchpriority="high" decoding="async" src="https://creatronix.de/wp-content/uploads/2021/03/argparse-e1661241731903.jpg" alt="" class="alignnone size-full wp-image-3535" width="500" height="380" srcset="https://creatronix.de/wp-content/uploads/2021/03/argparse-e1661241731903.jpg 500w, https://creatronix.de/wp-content/uploads/2021/03/argparse-e1661241731903-300x228.jpg 300w" sizes="(max-width: 500px) 100vw, 500px" /></p>
<p>Batteries included! The argparse module provides a nice way to parse your program arguments:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('positional_param')
    parser.add_argument('-o', '--optional_param')
    parser.add_argument('-r', '--required_param', required=True)
    args = parser.parse_args()
    print(args.positional_param)
    print(args.optional_param)
    print(args.required_param)</code></pre>
</div>
<h3>Positional Parameters</h3>
<p>Positional arguments are useful for shorter programs. These parameters do not have a name. You could call them &#8220;unnamed parameters&#8221; as well.</p>
<p>It is important that you keep the right sequence when calling the program:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>parser = argparse.ArgumentParser()
parser.add_argument('positional_param_one')
parser.add_argument('positional_param_two')
args = parser.parse_args()
print(args.positional_param_one)
print(args.positional_param_two)</code></pre>
</div>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>$ python positional_args.py foo bar
foo
bar</code></pre>
</div>
<h3>Optional Parameters</h3>
<p>Optional parameters have a name you have to use when providing an argument to the program</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>parser.add_argument('--optional_param')</code></pre>
</div>
<p><strong>The double dash indicates an optional parameter</strong></p>
<p>You can provide a shorthand for an optional parameter as well</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>parser.add_argument(<strong>'-o'</strong>, '--optional_param')</code></pre>
</div>
<p>When you leave out an optional argument, the value will be defaulted to None</p>
<p>Two ways of calling a program with optional arguments:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>$python optional_args.py
None</code></pre>
</div>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>$python optional_args.py --optional_param foobar
foobar</code></pre>
</div>
<h3>Required &#8220;Optional&#8221; Parameters</h3>
<p>Now it gets a bit weird: You can make an optional parameter required.</p>
<p>(another strong argument that these should be named &#8220;named parameters&#8221;)</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>parser.add_argument('-r', '--required_param', required=True)</code></pre>
<p>If you don not provide the argument you will get:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>$ python required_params.py
usage: required_params.py [-h] -r REQUIRED_PARAM
required_params.py: error: the following arguments are required: -r/--required_param</code></pre>
</div>
<p>So better provide the parameter:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>$ python required_params.py -r foobar
foobar</code></pre>
</div>
<h2>Command Line Help</h2>
<p>When you start your script with the -h or &#8211;help parameter, you will get a nice overview of the usage</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>$ python main.py -h
usage: main.py [-h] [-o OPTIONAL_PARAM] -r REQUIRED_PARAM positional_param

positional arguments:
positional_param

optional arguments:
-h, --help show this help message and exit
-o OPTIONAL_PARAM, ---optional_param OPTIONAL_PARAM
-r REQUIRED_PARAM, --required_param REQUIRED_PARAM</code></pre>
</div>
</div>
<p>The post <a href="https://creatronix.de/python-argparse/">Python Argparse</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Pandas Cheat Sheet</title>
		<link>https://creatronix.de/pandas-cheat-sheet/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Fri, 05 Mar 2021 13:17:19 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[cheat sheet]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[cumsum]]></category>
		<category><![CDATA[cumulative sum]]></category>
		<category><![CDATA[datetime]]></category>
		<category><![CDATA[delimiter]]></category>
		<category><![CDATA[dropping columns]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[head tail]]></category>
		<category><![CDATA[header]]></category>
		<category><![CDATA[iloc]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[loc]]></category>
		<category><![CDATA[pandas]]></category>
		<category><![CDATA[renamin columns]]></category>
		<category><![CDATA[rolling sum]]></category>
		<category><![CDATA[shifting]]></category>
		<category><![CDATA[snippets]]></category>
		<category><![CDATA[splitting strings]]></category>
		<category><![CDATA[value_counts]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=3112</guid>

					<description><![CDATA[<p>If you are new to Pandas feel free to read Introduction to Pandas I&#8217;ve assembled some pandas code snippets Reading Data Reading CSV import pandas as pd # read from csv df = pd.read_csv("path_to_file") Can also be textfiles. file suffix is ignored The default limiter for comma separated value files is the comma. If you&#8230;</p>
<p>The post <a href="https://creatronix.de/pandas-cheat-sheet/">Pandas Cheat Sheet</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>If you are new to Pandas feel free to read <a href="https://creatronix.de/introduction-to-pandas/">Introduction to Pandas</a></p>
<p>I&#8217;ve assembled some pandas code snippets</p>
<h2>Reading Data</h2>
<h3>Reading CSV</h3>
<pre>import pandas as pd

# read from csv
df = pd.read_csv("path_to_file")</pre>
<p>Can also be textfiles. file suffix is ignored</p>
<p>The default limiter for comma separated value files is the comma. If you have data with another delimiter you can specify it via:</p>
<pre>delimiter=";"</pre>
<p>If your data has no header you can pass header=None into the function</p>
<pre>df = pd.read_csv("./aoc_day_01_data.txt", header=None)</pre>
<p>With skiprows you can start reading in at any row</p>
<pre>skiprows=8</pre>
<p>Sometimes you need to alter the encoding as well:</p>
<pre>encoding="cp1252"</pre>
<h3>Reading Excel</h3>
<p>You can read excel files as well but you need to install</p>
<pre>pip install openpyxl</pre>
<pre>df = pd.read_excel("./my_excel_sheet.xlsx")</pre>
<p>With sheet_name you can select the individual sheet:</p>
<pre>sheet_name="my_sheet_1"</pre>
<h2>Inspecting data</h2>
<h3>Basic information</h3>
<pre>df.describe()</pre>
<h3>Length</h3>
<pre>len(df)</pre>
<h3>showing entries</h3>
<pre>df.head()</pre>
<pre>df.tail(10)</pre>
<h3>Indexing</h3>
<pre><span class="n">df</span><span class="p">[</span><span class="s1">'A'</span><span class="p">]</span></pre>
<p>gives you column A</p>
<p>iloc gives you entries based on numerical index</p>
<pre>#      [row, column]
df.iloc[0,   0]</pre>
<pre>#     [row, column]
df.loc[:, :]</pre>
<h2>Data Cleaning</h2>
<h3>Dropping columns</h3>
<pre>del df["column_name"]</pre>
<h3>Renaming columns</h3>
<pre>df.columns = ["new_column_name", ...]</pre>
<h3>Comparing columns</h3>
<pre>df['increased'] = (df['shifted'] &gt; df['original'])</pre>
<h3>Shifting columns</h3>
<pre>df['shifted'] = df['original'].shift(-1)</pre>
<h2>Splitting</h2>
<h3>Splitting strings into individual columns</h3>
<p>&nbsp;</p>
<pre>df = pd.DataFrame(df["original"].str.split('').tolist())</pre>
<h2></h2>
<h2>Counting and Calculating</h2>
<h3>Summing columns</h3>
<pre>df["value"].sum()</pre>
<h3>Cumulative sum</h3>
<pre>df["aim"].cumsum()</pre>
<h3>Rolling sum</h3>
<pre>indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()</pre>
<h3>Counting value occurence</h3>
<p>&nbsp;</p>
<pre>df['increased'].value_counts(dropna=False)</pre>
<h3>Counting occurrences for all columns</h3>
<pre>df = pd.concat([df[column].value_counts() for column in df], axis = 1)</pre>
<h3>Convert column to datetime</h3>
<p>&nbsp;</p>
<pre>df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])</pre>
<h3>Convert datetime to minutes since midnight</h3>
<p>&nbsp;</p>
<pre>df_train.loc[:, 'msm'] = df_train.loc[:, "date"].dt.hour * 60 + df_train.loc[:, "date"].dt.minute</pre>
<p>The post <a href="https://creatronix.de/pandas-cheat-sheet/">Pandas Cheat Sheet</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>SQLite3: Python and SQL</title>
		<link>https://creatronix.de/sqlite3-python-and-sql/</link>
					<comments>https://creatronix.de/sqlite3-python-and-sql/#respond</comments>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Thu, 02 Apr 2020 08:57:45 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sqlite]]></category>
		<category><![CDATA[sqlite3]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1323</guid>

					<description><![CDATA[<p>Everything we did in the last articles of the series SQL-Tutorial was a dry run because we just used SQLFiddle. So let&#8217;s start with a real database like SQLite. SQLite is a file based DBRMS and can be used for e.g. web sites. The official docs say: &#8220;SQLite works great as the database engine for&#8230;</p>
<p>The post <a href="https://creatronix.de/sqlite3-python-and-sql/">SQLite3: Python and SQL</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Everything we did in the last articles of the series <a href="https://creatronix.de/sql-tutorial/">SQL-Tutorial</a> was a dry run because we just used SQLFiddle.</p>
<p>So let&#8217;s start with a real database like SQLite.</p>
<p>SQLite is a file based DBRMS and can be used for e.g. web sites. The official docs say:</p>
<p><em>&#8220;SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). [..] Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite.&#8221;</em></p>
<p>Because Knight Industries is not Google, Amazon nor Facebook we can definitely use SQLite.</p>
<h2>Creating and connecting to a database</h2>
<p>In Python it is pretty easy to connect to a SQLite database:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>from sqlite3 import connect 
db_connection = connect('knight_industries.db')</code></pre>
</div>
<p>If the file knight_industries.db does not exist, it will be created automagically. A nice little feature of the sqlite3 library.</p>
<p>But be careful: If You already have a database file and you mess up the path in the connect statement you will wonder why you cannot access your data, because a new file is created silently.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>cursor = db_connection.cursor()
cursor.execute('''CREATE TABLE operatives (id INTEGER, name TEXT, birthday DATE)''')
cursor.execute('''INSERT INTO operatives (id, name, birthday) \
                  VALUES (1, "Michael Arthur Long", "1949-01-09")''')

db_connection.commit()
cursor.execute('''SELECT * FROM operatives''')
print cursor.fetchone()
db_connection.close()</code></pre>
</div>
<p>&nbsp;</p>
<p>The post <a href="https://creatronix.de/sqlite3-python-and-sql/">SQLite3: Python and SQL</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://creatronix.de/sqlite3-python-and-sql/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Intro to OpenCV with Python</title>
		<link>https://creatronix.de/intro-to-opencv-with-python/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Mon, 23 Jul 2018 19:11:18 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[bgr]]></category>
		<category><![CDATA[blur]]></category>
		<category><![CDATA[gaussian]]></category>
		<category><![CDATA[noise]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[rgb]]></category>
		<category><![CDATA[smoothing]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1593</guid>

					<description><![CDATA[<p>Installation To work with OpenCV from python, you need to install it first. We additionally install numpy and matplotlib as well pip install opencv-python numpy matplotlib Reading Images from file After we import cv2 we can directly work with images like so: import cv2 img = cv2.imread("doc_brown.png") For showing the image, it is recommended to&#8230;</p>
<p>The post <a href="https://creatronix.de/intro-to-opencv-with-python/">Intro to OpenCV with Python</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Installation</h2>
<p>To work with OpenCV from python, you need to install it first. We additionally install numpy and matplotlib as well</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install opencv-python numpy matplotlib</code></pre>
</div>
<h2>Reading Images from file</h2>
<p>After we import cv2 we can directly work with images like so:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>import cv2 
img = cv2.imread("doc_brown.png")</code></pre>
</div>
<p>For showing the image, it is recommended to use matplotlib</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>import matplotlib.pyplot as plt 
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
plt.imshow(img) 
plt.show()</code></pre>
</div>
<p>OpenCV stores images internally in the BGR format &#8211; blue &#8211; green &#8211; red so we have to convert to RGB before displaying them.</p>
<h2><img decoding="async" class="alignnone size-full wp-image-1748" src="https://creatronix.de/wp-content/uploads/2018/07/doc_brown.png" alt="" width="640" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/doc_brown.png 640w, https://creatronix.de/wp-content/uploads/2018/07/doc_brown-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></h2>
<h2>Image Shape</h2>
<p>We have now an image object which can already tell us more about the image itself:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>print(img.shape) 
(205, 236, 3)</code></pre>
</div>
<p>The tuple shows us the number of (rows, columns, channels)</p>
<h2>Manipulate brightness</h2>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>import numpy as np 
brightness = np.zeros(img.shape, dtype="uint8") + 30 
img = cv2.add(img, brightness)</code></pre>
</div>
<p>Manipulating brightness works like this: you need a numpy array with the size of the image, add your brightness and add the brightness array to the original image.</p>
<h2><img decoding="async" class="alignnone size-full wp-image-1749" src="https://creatronix.de/wp-content/uploads/2018/07/bright_doc_30.png" alt="" width="640" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/bright_doc_30.png 640w, https://creatronix.de/wp-content/uploads/2018/07/bright_doc_30-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></h2>
<h2>Adding noise</h2>
<p>Adding noise works the same way, but instead of adding a fixed value to every pixel you add normal distributed values.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>noise = np.random.randint(0, 100, size=img.shape, dtype="uint8") 
img = cv2.add(img, noise)</code></pre>
</div>
<p><img decoding="async" class="alignnone size-full wp-image-1811" src="https://creatronix.de/wp-content/uploads/2018/07/noisy_doc.png" alt="" width="640" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/noisy_doc.png 640w, https://creatronix.de/wp-content/uploads/2018/07/noisy_doc-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></p>
<h2>Smoothing</h2>
<p>Smoothing is reducing the noise of an image by adding a Gaussian blur:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>img = cv2.GaussianBlur(img, ksize=(31, 31), sigmaX=5)</code></pre>
</div>
<h2><img decoding="async" class="alignnone size-full wp-image-1812" src="https://creatronix.de/wp-content/uploads/2018/07/smooth_doc.png" alt="" width="640" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/smooth_doc.png 640w, https://creatronix.de/wp-content/uploads/2018/07/smooth_doc-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></h2>
<h2>Gray-scale conversion</h2>
<p>Last but not least we can convert an image to a gray scale. Beware that for showing / storing the image you need to add the flag cmap=&#8221;gray&#8221;</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
plt.imshow(img, cmap='gray')</code></pre>
</div>
<p><img decoding="async" class="alignnone size-full wp-image-1815" src="https://creatronix.de/wp-content/uploads/2018/07/gray_doc.png" alt="" width="640" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/gray_doc.png 640w, https://creatronix.de/wp-content/uploads/2018/07/gray_doc-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></p>
<p>Have fun fiddling around with OpenCV!</p>
<h2>Github Repo</h2>
<p><a href="https://github.com/jboegeholz/introduction_to_opencv">https://github.com/jboegeholz/introduction_to_opencv</a></p>
<p>The post <a href="https://creatronix.de/intro-to-opencv-with-python/">Intro to OpenCV with Python</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Linear Algebra with numpy</title>
		<link>https://creatronix.de/linear-algebra-with-numpy/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Fri, 04 May 2018 11:42:22 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[numpy]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1314</guid>

					<description><![CDATA[<p>Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C. It is often used together with pandas, matplotlib and Jupyter notebooks. Often these packages are referred to as the datascience stack. Installation You can install numpy via pip pip install numpy Basic Usage In the datascience&#8230;</p>
<p>The post <a href="https://creatronix.de/linear-algebra-with-numpy/">Linear Algebra with numpy</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C.</p>
<p>It is often used together with <a href="https://creatronix.de/introduction-to-pandas/">pandas</a>, <a href="https://creatronix.de/introduction-to-matplotlib/">matplotlib</a> and <a href="https://creatronix.de/introduction-to-jupyter-notebook/">Jupyter</a> notebooks. Often these packages are referred to as the datascience stack.</p>
<h2>Installation</h2>
<p>You can install numpy via pip</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install numpy</code></pre>
</div>
<h2>Basic Usage</h2>
<p>In the datascience world numpy is often imported like this:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>import numpy as np</code></pre>
</div>
<p>The &#8220;as&#8221; keyword defines a so called alias. Now you can use structures from numpy by referencing them with &#8220;np&#8221; instaed of the whole name.</p>
<p>Think &#8220;abbreviation&#8221;.</p>
<h3>n-dimensional array</h3>
<p>The most important data structure is ndarray, which is short for n-dimensional array.</p>
<p>You can convert a list to an numpy array with the array-method</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_list = [1, 2, 3, 4] 
my_array = np.array(my_list)</code></pre>
</div>
<p>You can also convert an array back to a list with</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_new_list = my_array.tolist()</code></pre>
</div>
<p>You can retrieve the dimensionality of an array with the ndim property:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_array.ndim</code></pre>
</div>
<p>and get the number of data points with the shape property</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_array.shape</code></pre>
</div>
<h2>Vector arithmetic</h2>
<h3>Addition / Subtraction</h3>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>a = np.array([1, 2, 3, 4]) 
b = np.array([4, 3, 2, 1]) 
a + b 
array([5, 5, 5, 5]) 

a - b 
array([-3, -1, 1, 3])</code></pre>
</div>
<h3>Scalar Multiplication</h3>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>a = np.array([1, 2, 3, 4]) 
a * 3 

array([3, 6, 9, 12])</code></pre>
</div>
<p>To see why it is charming to use numpy&#8217;s array for this operation You have to consider the alternative:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>c = [1,2,3,4] 
d = [x * 3 for x in c]</code></pre>
</div>
<h3>Dot Product</h3>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>a = np.array([1,2,3,4]) 
b = np.array([4,3,2,1]) 
a.dot(b) 

20 # 1*3 + 2*3 + 3*2 + 4*1</code></pre>
</div>
<p>Learn more about numpy:</p>
<p><a href="https://creatronix.de/numpy-random-choice/">numpy random choice</a></p>
<p><a href="https://creatronix.de/numpy-linspace-function/">Numpy linspace function</a></p>
<p><a href="https://github.com/jboegeholz/introduction_to_numpy/blob/master/01_numpy_arrays.ipynb">Project on github</a></p>
<p>The post <a href="https://creatronix.de/linear-algebra-with-numpy/">Linear Algebra with numpy</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>pip optional dependencies</title>
		<link>https://creatronix.de/pip-optional-dependencies/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Mon, 16 Apr 2018 12:11:48 +0000</pubDate>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Flask]]></category>
		<category><![CDATA[optional]]></category>
		<category><![CDATA[pip]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1464</guid>

					<description><![CDATA[<p>Sometimes you want to make your python package usable for different situations, e.g. flask or bottle or django. If You want to minimize dependencies You can use an optional dependency in setup.py: extras_require={ 'flask': ['Flask&#62;=0.8', 'blinker&#62;=1.1'] } Now you can install the library with: pip install raven[flask] &#160;</p>
<p>The post <a href="https://creatronix.de/pip-optional-dependencies/">pip optional dependencies</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Sometimes you want to make your python package usable for different situations, e.g. flask or bottle or django.</p>
<p>If You want to minimize dependencies You can use an optional dependency in setup.py:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>extras_require={ 
    'flask': ['Flask&gt;=0.8', 'blinker&gt;=1.1'] 
}</code></pre>
</div>
<p>Now you can install the library with:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install raven[flask]</code></pre>
</div>
<p>&nbsp;</p>
<p>The post <a href="https://creatronix.de/pip-optional-dependencies/">pip optional dependencies</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>My personal roadmap for learning data science in 2018</title>
		<link>https://creatronix.de/my-personal-road-map-for-learning-data-science/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Wed, 13 Dec 2017 14:05:14 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Self-Improvement & Personal Finance]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[new year's resolution]]></category>
		<category><![CDATA[numpy]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[road map]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1177</guid>

					<description><![CDATA[<p>I got confused by all the buzzwords: data science, machine learning, deep learning, neural nets, artificial intelligence, big data, and so on and so on. As an engineer I like to put some structure to the chaos. Inspired by Roadmap: How to Learn Machine Learning in 6 Months and Tetiana Ivanova &#8211; How to become&#8230;</p>
<p>The post <a href="https://creatronix.de/my-personal-road-map-for-learning-data-science/">My personal roadmap for learning data science in 2018</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>I got confused by all the buzzwords: data science, machine learning, deep learning, neural nets, artificial intelligence, big data, and so on and so on.</p>
<p><img decoding="async" class="alignnone size-full wp-image-1253" src="https://creatronix.de/wp-content/uploads/2017/12/normal_distribution_3.png" alt="" width="487" height="469" srcset="https://creatronix.de/wp-content/uploads/2017/12/normal_distribution_3.png 487w, https://creatronix.de/wp-content/uploads/2017/12/normal_distribution_3-300x289.png 300w" sizes="(max-width: 487px) 100vw, 487px" /></p>
<p>As an engineer I like to put some structure to the chaos. Inspired by <a href="https://youtu.be/MOdlp1d0PNA"><span id="eow-title" class="watch-title" dir="ltr" title="Roadmap: How to Learn Machine Learning in 6 Months">Roadmap: How to Learn Machine Learning in 6 Months </span></a>and <a href="https://youtu.be/rIofV14c0tc"><span id="eow-title" class="watch-title" dir="ltr" title="Tetiana Ivanova - How to become a Data Scientist in 6 months a hacker’s approach to career planning">Tetiana Ivanova &#8211; How to become a Data Scientist in 6 months a hacker’s approach to career planning </span></a> I build my own learning road map for this year:<br />
So 2018 will be all about Data Science. Hearing about the <a href="http://jarche.com/pkm/">Personal Knowledge Mastery</a> concept at SWEC17 I am going to tackle the learning process on different levels.</p>
<h2>Watch the Pros</h2>
<p>Thanks to open course ware there are a ton of awesome university courses online e.g.:</p>
<p><a href="https://youtu.be/C1lhuz6pZC0">MIT 6.0002 Introduction to Computational Thinking and Data Science</a></p>
<h2>Learn the tools</h2>
<p>There is already a whole bunch of tools we can consider belonging to a standard data science stack. Because my main language is Python the focus is of course on mostly python modules.</p>
<ul>
<li><a href="https://creatronix.de/introduction-to-jupyter-notebook/">JuPyter Notebook</a></li>
<li><a href="https://creatronix.de/linear-algebra-with-numpy-part-1/">numpy</a></li>
<li>pandas</li>
<li><a href="https://seaborn.pydata.org/">seaborn</a></li>
<li><a href="https://bokeh.pydata.org/en/latest/">bokeh</a></li>
<li><a href="http://holoviews.org/">holoviews</a></li>
<li><a href="http://scikit-learn.org/stable/">scikit-learn</a></li>
<li><a href="https://keras.io/">keras</a> / <a href="https://www.tensorflow.org/">TensorFlow</a></li>
<li>Tableau</li>
</ul>
<h2>Finishing Udacity / Udemy courses</h2>
<p>To brush up my python skills and my knowledge of basic computer science I will finish some already started online courses:</p>
<ul>
<li style="list-style-type: none;">
<ul>
<li>[  ] <a href="https://creatronix.de/ud120-intro-to-machine-learning/">Introduction to Machine Learning</a></li>
<li>[  ] Python Bootcamp</li>
<li>[  ] Algorithms and Data Structures</li>
<li>[  ] Introduction to Artificial Intelligence</li>
<li>[  ] <a href="https://classroom.udacity.com/courses/ud810/">Introduction to computer vision</a></li>
<li>[  ] <a href="https://classroom.udacity.com/courses/cs373">Artificial Intelligence for Robotics</a></li>
</ul>
</li>
</ul>
<h2>Reading data science books</h2>
<p>To get a broad overview I bought two books on DS / ML</p>
<ul>
<li>[  ] Data Science from Scratch</li>
<li>[  ] Hands on Machine Learning</li>
</ul>
<h2>Do Exercises on Kaggle</h2>
<ul>
<li>[x] Create Account at Kaggle</li>
<li>[  ] Do first exercise</li>
<li>[  ] Participate in a contest</li>
</ul>
<h2>Visit Meetups about Data Science</h2>
<p>[  ] Visit <a href="https://www.meetup.com/de-DE/Nuernberg-Big-Data/?_af_cid=Nuernberg-Big-Data">Big Data Meetup Events</a></p>
<h2>Add some Peer Pressure</h2>
<p>My brother in law and I teemed up and build a Whatsapp learn &amp; exchange group. We are currently four members.</p>
<h2>Write Blog Articles</h2>
<p>I will try to incorporate some of the stuff I&#8217;ve learned into blog articles.</p>
<p>I already did</p>
<ul>
<li><a href="https://creatronix.de/bayes-theorem-part-1/">Bayes’ Theorem Part 1</a></li>
<li><a href="https://creatronix.de/data-science-overview/">Data Science Overview</a></li>
<li><a href="https://creatronix.de/classification-precision-and-recall/">Classification: Precision and Recall</a></li>
<li><a href="https://creatronix.de/confusion-matrix/">Confusion Matrix</a></li>
<li><a href="https://creatronix.de/ud120-intro-to-machine-learning/">UD120 Intro to Machine Learning</a></li>
<li><a href="https://creatronix.de/lesson-2-naive-bayes/">Lesson 2: Naive Bayes</a></li>
<li><a href="https://creatronix.de/lesson3-support-vector-machines/">Lesson 3: Support Vector Machines</a></li>
</ul>
<p>So stay tuned!</p>
<p>The post <a href="https://creatronix.de/my-personal-road-map-for-learning-data-science/">My personal roadmap for learning data science in 2018</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Regular Expressions Demystified &#8211; A Mini DSL for Regex in Python</title>
		<link>https://creatronix.de/regular-expressions-demystified/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Mon, 05 Jun 2017 12:36:31 +0000</pubDate>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[DSL]]></category>
		<category><![CDATA[meta character]]></category>
		<category><![CDATA[Regex]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=578</guid>

					<description><![CDATA[<p>Motivation Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already. Because I&#8217;ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I&#8217;ve started&#8230;</p>
<p>The post <a href="https://creatronix.de/regular-expressions-demystified/">Regular Expressions Demystified &#8211; A Mini DSL for Regex in Python</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Motivation</h2>
<p>Every Junior Developer needs some pet projects to try out some techniques he or she is not familiar with already.</p>
<p>Because I&#8217;ve always had a hard time with regular expressions (I know that they are useful, but I use them so rarely that I cannot get a hold of all the syntax) I&#8217;ve started a little project to ease up the use of RegEx.</p>
<p>&nbsp;</p>
<h2>What are Regular Expressions aka RegEx?</h2>
<p>RegEx are a sequence of characters which help you to search patterns in text.</p>
<p>Say you have an input string which contains whitespaces, tabs and line break:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>input_string = " \tJoernBoegeholz \n"</code></pre>
</div>
<p>You will certainly agree that it won&#8217;t be a good idea to use this string as e.g. a username.  If a username is necessary to login into a system, a user will not remember if he accidentially typed a whitespace character in to form field.. So we have to replace the whitespaces, tabs and linebreak.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>output = input_string.replace(" ", "") 
output = output.replace("\t", "") 
output = output.replace("\n", "")</code></pre>
</div>
<p>This is a bit messy, with RegEx we can use the &#8220;\s&#8221; Metacharacter</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>output = re.sub("\s", "", input_string)</code></pre>
</div>
<p>From the <a href="https://docs.python.org/2/library/re.html">Python Doc</a>:</p>
<p>&#8220;When the <a class="reference internal" title="re.UNICODE" href="https://docs.python.org/2/library/re.html#re.UNICODE"><code class="xref py py-const docutils literal"><span class="pre">UNICODE</span></code></a> flag is not specified, it matches any whitespace character, this is equivalent to the set <code class="docutils literal"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>.&#8221;</p>
<p>Please take this just as an example, in production code You would use &#8220;strip()&#8221; to remove leading and trailing whitespaces.</p>
<p>OK, here is the catch: I cannot remember the meta-characters. That makes working with RegEx cumbersome for me.</p>
<h2>First step</h2>
<p>All meta-characters are represented as a constant.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>ANY_CHAR = '.' 
DIGIT = '\d' 
NON_DIGIT = '\D' 
WHITESPACE = '\s' 
NON_WHITESPACE = '\S' 
ALPHA = '[a-zA-Z]' 
ALPHANUM = '\w' 
NON_ALPHANUM = '\W'</code></pre>
</div>
<h2>Second Step</h2>
<p>We wrap the multiplier in convenience methods.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>def zero_or_more(string): 
    return string + '*' d

ef zero_or_once(string): 
    return string + '?' 

def one_or_more(string): 
    return string + '+'</code></pre>
</div>
<h2>Third Step</h2>
<p>As syntactic sugar we introduce a class which encapsulates the pattern:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>class Pattern:

    def __init__(self):
        self.pattern = ''

    def starts_with(self, start_str):
        self.pattern += start_str
        return self

    def followed_by(self, next_string):
        self.pattern += next_string
        return self

    def __str__(self):
        return self.pattern

    def __repr__(self):
        return self._regex/code&gt;</code></pre>
</div>
<h2>Result</h2>
<p>Instead of writing</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>pattern = "\d\D+\s{2,4}"</code></pre>
</div>
<p>you can now write</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>pattern = Pattern()
pattern.starts_with(DIGIT)\
    .followed_by(one_or_more(NON_DIGIT))\
    .followed_by(between(2, 4, WHITESPACE))
</code></pre>
</div>
<p>which is more human readable.</p>
<h2>My first PyPI package</h2>
<p>After using</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install &lt;module_name&gt;</code></pre>
</div>
<p>for a couple of years, I wanted to know how I can upload a new package to PyPI or the &#8220;Python Package Index&#8221;, so I&#8217;ve written another tutorial:</p>
<p><a href="https://creatronix.de/distributing-your-own-package-on-pypi/">Distributing your own package on PyPi</a></p>
<p>At the moment it&#8217;s a pet project, but if you are interested You can use the code via</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install easy_pattern.</code></pre>
</div>
<h2>Links</h2>
<p><a href="https://pypi.org/project/easy_pattern/">PyPi</a></p>
<p><a href="https://github.com/jboegeholz/easypattern">Github</a></p>
<p>The post <a href="https://creatronix.de/regular-expressions-demystified/">Regular Expressions Demystified &#8211; A Mini DSL for Regex in Python</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
