Data Science & SQL Archives - Creatronix

Qualitätsmetriken bei binärer Klassifikation

Jörn — Mon, 13 Jul 2026 06:44:34 +0000

Motivation

Im Universum Bremen haben wir die Sonderausstellung KI besucht. Dort konnte man an einer Station prüfen, ob man Chihuahuas von Muffins unterscheiden kann. Die Kinder hatten einen Riesenspaß, da das Ganze auf Zeit ging und sie daraus einen Wettbewerb machen konnten. Wer schneller im Unterscheiden war, hatte gewonnen. Eine bessere Zeit zählt jedoch nicht, wenn Karten dabei falsch zugeordnet wurden.

Um überprüfen zu können, wie gut eine Klassifizierung ist, können wir Metriken definieren. Vorher brauchen wir aber noch ein paar grundlegende Begriffe.

Begriffe

Die Unterscheidung zwischen Chihuahua und Muffin ist gar nicht so leicht. Grundsätzlich gibt es vier mögliche Ergebnisse: Ein Chihuahua kann korrekt als Chihuahua erkannt werden, oder er wird fälschlicherweise als Muffin eingeordnet. Umgekehrt kann ein Muffin korrekt als Muffin erkannt werden, oder er wird für einen Chihuahua gehalten. Das sind die vier Klassifikationsmöglichkeiten.

Die Klasse der Objekte, die man erkennen möchte, gilt als der positive Wert. In unserem Fall möchten wir Chihuahuas erkennen.

Wirklichkeit	Vorhersage	Ergebnis
Chihuahua	Chihuahua	True Positive
Muffin	Chihuahua	False Positive
Chihuahua	Muffin	False Negative
Muffin	Muffin	True Negative

Ein False Positive wird auch als Typ-1-Fehler bezeichnet, ein False Negative als Typ-2-Fehler.

Beispiel

Wenn sich meine Tochter zwanzig Bilder von Muffins und Chihuahuas anschaut (8 Muffins und 12 Chihuahuas)
und sie 10 davon als Chihuahuas klassifiziert, sich unter diesen zehn jedoch tatsächlich 2 Muffins befinden,
ergibt sich folgendes Bild:

8 True Positives – korrekt als Chihuahua erkannt
2 False Positives – Muffin fälschlicherweise als Chihuahua eingeordnet
6 True Negatives – korrekt als Muffin erkannt
4 False Negatives – Chihuahua fälschlicherweise als Muffin klassifiziert

Mit diesen Werten berechnen wir im Folgenden die Metriken.

Metriken

Metriken dienen der Überprüfung der Qualität von Machine-Learning-Algorithmen. Sie funktionieren besonders gut bei Supervised-Learning-Algorithmen, da wir gelabelte Trainings- und Testdaten (Ground-Truth-Labels) vorliegen haben und somit überprüfen können, ob eine Klassifikation korrekt oder falsch erfolgte.

Precision

Die Präzision – auch als positive predictive value bezeichnet – gibt den Anteil der korrekt als positiv klassifizierten Ergebnisse an der Gesamtheit der als positiv klassifizierten Ergebnisse an.

Precision = True Positive / (True Positive + False Positive)

In unserem Beispiel beträgt die Präzision 8 / (8 + 2) = 4/5 oder 80 %.

Recall

Der Recall – auch Sensitivität genannt – gibt die Wahrscheinlichkeit an, mit der ein positives Objekt korrekt als positiv klassifiziert wird.

Recall = True Positive / (True Positive + False Negative)

Da bekannt ist, dass es tatsächlich 12 Chihuahuas gibt und meine Tochter 4 davon übersieht (falsch-negative Ergebnisse), beträgtihre Trefferquote (Recall) 8 / (8 + 4) = 2/3 oder rund 67 %.

Accuracy

Der Accuracy-Score berechnet sich aus dem Anteil aller korrekt klassifizierten Ergebnisse:

Accuracy = (TP + TN) / (TP + FP + TN + FN)

In unserem Beispiel ergibt das:

(8 + 6) / 20 = 14/20 = 7/10 = 70 %

F1-Score

Der F1-Score ist das harmonische Mittel zwischen Precision und Recall und bietet sich besonders dann an, wenn beide Größen gleichermaßen wichtig sind.

f1 = 2/(1/Precision + 1/Recall)

Für unser Beispiel: 2 / (1 / (4/5) + 1 / (2/3)) = 2 / (5/4 + 3/2) = 2 / (11/4) = 2*(4/11) = 8/11 = 0,727

Fazit

Precision, Recall, Accuracy und F1-Score sind vier grundlegende Metriken, um die Qualität eines binären Klassifikationsmodells zu bewerten. Welche Metrik im Vordergrund steht, hängt dabei vom konkreten Anwendungsfall ab:

Ist es besonders wichtig, keine positiven Fälle zu verpassen (z. B. bei medizinischen Diagnosen), rückt der Recall in den Fokus.
Soll hingegen vermieden werden, zu viele Fehlalarme zu produzieren, ist Precision die relevantere Größe.
Der F1-Score bietet sich an, wenn beide Aspekte gleich gewichtet werden sollen.

Darüber hinaus gibt es weitere Metriken – etwa den ROC-AUC-Score oder die Konfusionsmatrix –, die ein noch detaillierteres Bild liefern können.

Beim Unsupervised Learning gelten zudem andere Regeln: Da keine Ground-Truth-Labels vorliegen, kommen dort eigene Verfahren wie der Silhouetten-Koeffizient oder der Davies-Bouldin-Index zum Einsatz – aber das ist Stoff für einen eigenen Artikel.

The post Qualitätsmetriken bei binärer Klassifikation appeared first on Creatronix.

How to deal with date and time in SQLite

Jörn — Mon, 14 Nov 2022 20:41:47 +0000

Motivation

When dealing with databases you will need the ability to store certain dates and/or timestamps in your tables.
Let’s find out how you can do that in an SQLite database.

SQL table creation

SQLite has the data type TIMESTAMP for storing date-times

CREATE TABLE social_media (
        social_media_id INTEGER,
        insertion_date TIMESTAMP,
        yt_subs INTEGER
        fb_pg INTEGER
        insta_pg INTEGER,
        twitter_pg INTEGER)

Python code

In our Python code we will have to add this parameter to the connection

detect_types=sqlite3.PARSE_DECLTYPES | sqlite3.PARSE_COLNAMES

db_connection = sqlite3.connect('bmv.db', detect_types=sqlite3.PARSE_DECLTYPES | sqlite3.PARSE_COLNAMES)
cursor = db_connection.cursor()
cursor.execute('''SELECT * FROM social_media ORDER BY insertion_date DESC''')
rows = cursor.fetchall()

Now we will get datetime objects from our query.

SQL-Tutorial

Jörn — Mon, 17 Jan 2022 13:10:11 +0000

I really like SQL. But sometimes I struggle to undestand some of the concepts. So I write about it.

Meanwhile there are a bunch of articles, so time for a overview page:

The post SQL-Tutorial appeared first on Creatronix.

2021 – Advent of code – Day 3

Jörn — Tue, 07 Dec 2021 10:18:53 +0000

Part 1

You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the gamma rate and
the epsilon rate). The power consumption can then be found by multiplying the gamma rate by the epsilon rate.

Each bit in the gamma rate can be determined by finding the most common bit in the corresponding position of all numbers
in the diagnostic report. For example, given the following diagnostic report:

00100
11110
10110
10111
10101
01111
00111
11100
10000
11001
00010
01010

Considering only the first bit of each number, there are five 0 bits and seven 1 bits. Since the most common bit is 1, the first bit of the gamma rate is 1.

The most common second bit of the numbers in the diagnostic report is 0, so the second bit of the gamma rate is 0.

The most common value of the third, fourth, and fifth bits are 1, 1, and 0, respectively, and so the final three bits of the gamma rate are 110.

So, the gamma rate is the binary number 10110, or 22 in decimal.

The epsilon rate is calculated in a similar way; rather than use the most common bit, the least common bit from each position is used.
So, the epsilon rate is 01001, or 9 in decimal. Multiplying the gamma rate (22) by the epsilon rate (9) produces the power consumption, 198.

Use the binary numbers in your diagnostic report to calculate the gamma rate and epsilon rate, then multiply them together. What is the power consumption of the submarine? (Be sure to represent your answer in decimal, not binary.)

Reading the data

import pandas as pd

df = pd.read_csv("./aoc_day_03_test_data.txt", dtype = str, header=None)
df.columns = ["original"]

We need to define the input data as string via dtype = str. Otherwise it will be treated as integer and we loose the leading zeroes.

	original
0	00100
1	11110
2	10110
3	10111
4	10101
5	01111
6	00111
7	11100
8	10000
9	11001
10	00010
11	01010

Now we can split the strings into individual bits with the split() function.

To generate a new column for every bit we use the tolist() function. The first and the last column is empty, so let’s remove them.

df = pd.DataFrame(df["original"].str.split('').tolist())
del df[0]
df = df.iloc[:,:-1]
df

0	1	2	3	4	5
0	0	0	1	0	0
1	1	1	1	1	0
2	1	0	1	1	0
3	1	0	1	1	1
4	1	0	1	0	1
5	0	1	1	1	1
6	0	0	1	1	1
7	1	1	1	0	0
8	1	0	0	0	0
9	1	1	0	0	1
10	0	0	0	1	0
11	0	1	0	1	0

Now we have to look which value appears the most in each column.

df = pd.concat([df[column].value_counts() for column in df], axis = 1) 
df

	1	2	3	4	5
1	7	5	8	7	5
0	5	7	4	5	7

The most common bits are

most_common_bits = df.idxmax()
most_common_bits

1    1
2    0
3    1
4    1
5    0
dtype: object

Now we have to concatenate these bits to one string and convert it back into an integer

gamma_rate = ''.join(most_common_bits)
gamma_rate = int(gamma_rate, 2)
gamma_rate

That gives us the gamma rate of 22

The epsilon rate can be calculated by inverting all significant bits in the most_common_bits value. We do this by creating a bit mask

bit_mask = 2 ** len(most_common_bits) - 1

and applying it with an XOR to the gamma rate:

epsilon_rate = gamma_rate ^ bit_mask
epsilon_rate

The solution is:

gamma_rate * epsilon_rate

The post 2021 – Advent of code – Day 3 appeared first on Creatronix.

2021 – Advent of code – Day 2

Jörn — Fri, 03 Dec 2021 11:23:34 +0000

Part 1

Today the puzzle got a bit trickier than Day 1.

The submarine seems to already have a planned course (your puzzle input). You should probably figure out where it's going. For example:

    forward 5
    down 5
    forward 8
    up 3
    down 8
    forward 2

Your horizontal position and depth both start at 0. The steps above would then modify them as follows:

    forward 5 adds 5 to your horizontal position, a total of 5.
    down 5 adds 5 to your depth, resulting in a value of 5.
    forward 8 adds 8 to your horizontal position, a total of 13.
    up 3 decreases your depth by 3, resulting in a value of 2.
    down 8 adds 8 to your depth, resulting in a value of 10.
    forward 2 adds 2 to your horizontal position, a total of 15.

After following these instructions, you would have a horizontal position of 15 and a depth of 10. (Multiplying these together produces 150.)

Calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?

So Pandas here we go again:

df = pd.read_csv("./aoc_day_02_data.txt", delimiter=" ",header=None)
df.columns = ["command", "value"]

Alright, reading in the data and naming the columns are the same steps as yesterday. Now we have to columns.

	0	1
0	forward	5
1	down	5
2	forward	8
3	up	3
4	down	8
5	forward	5

horizontal = df[df['command']=="forward"]["value"].sum()

The horizontal value can be calculated with the sum function when we filter the data frame to rows where the command is “forward”

depth = df[df['command']=="down"]["value"].sum() - df[df['command']=="up"]["value"].sum()

The depth can calculated by summing up the down and up commands separately and subtract the sums from each other.

Now we have to multiply the depth and the position to get the solution

position = depth * horizontal

Part 2

    down X increases your aim by X units.
    up X decreases your aim by X units.
    forward X does two things:
        It increases your horizontal position by X units.
        It increases your depth by your aim multiplied by X.

Now, the above example does something different:

    forward 5 adds 5 to your horizontal position, a total of 5. Because your aim is 0, your depth does not change.
    down 5 adds 5 to your aim, resulting in a value of 5.
    forward 8 adds 8 to your horizontal position, a total of 13. Because your aim is 5, your depth increases by 8*5=40.
    up 3 decreases your aim by 3, resulting in a value of 2.
    down 8 adds 8 to your aim, resulting in a value of 10.
    forward 2 adds 2 to your horizontal position, a total of 15. Because your aim is 10, your depth increases by 2*10=20 to a total of 60.

After following these new instructions, you would have a horizontal position of 15 and a depth of 60. 
(Multiplying these produces 900.)

Using this new interpretation of the commands, calculate the horizontal position and depth you would have after following the planned course. 
What do you get if you multiply your final horizontal position by your final depth?

To get an overview I simplified the table

Here I had a hard time to do it with pandas so vanilla python to the rescue:

if __name__ == '__main__':

    with open("./aoc_day_02_data.txt") as file:
        lines = file.readlines()
        lines = [line.rstrip() for line in lines]

    horizontal = 0
    current_aim = 0
    depth = 0
    for line in lines:
        print(line)
        command, value = line.split(" ")
        value = int(value)
        if command == "forward":
            horizontal += value
            depth += value * current_aim
        if command == "down":
            current_aim += value
        if command == "up":
            current_aim += value * -1

    print(f"horizontal: {horizontal}")
    print(f"depth: {depth}")
    print(horizontal * depth)

Update

I’ve figured out how to do it with Pandas as well

import pandas as pd

df = pd.read_csv("./aoc_day_02_test_data.txt", delimiter=" ",header=None)
df.columns = ["command", "value"]

horizontal = df[df['command']=="forward"]["value"].sum()

df.loc[df['command']=="up", "value"] = df[df['command']=="up"].mul(-1)
df["aim"] = 0

df.loc[df['command']!="forward", "aim"] = df[df['command']!="forward"]["value"]
df["current_aim"] = df["aim"].cumsum()

df.loc[df['command']=="forward", "depth"] = df[df['command']=="forward"]["value"] * df[df['command']=="forward"]["current_aim"]
depth = df["depth"].sum()

The post 2021 – Advent of code – Day 2 appeared first on Creatronix.

2021 – Advent of code – Day 1

Jörn — Thu, 02 Dec 2021 09:30:22 +0000

I’ve haven’t participated in the advent of code before. But always been curious.

What is advent of code?

It’s an advent Calendar for programmers. You get 25 challenges starting December 1st. Caveat: you have to solve the challenge to be eligible for the next day’s challenge 🙂

Day 1 Challenge – Part 1

On the first day your first task is to count how many times a value is bigger than its predecessor. They give us some sample data

199 N/A
200 bigger
208 bigger
210 bigger
200 smaller
207 bigger
240 bigger
269 bigger
260 smaller
263 bigger

When we count the times a value is bigger we get seven times bigger.

The actual data contains 2000 rows. This isn’t exactly big data but I’ve wanted to dust off my Pandas skill, so here we go:

Let’s look at the data

import pandas as pd

df = pd.read_csv("./aoc_day_01_data.txt", header=None)
df.describe

With the read_csv() function we can read in our data file and convert it into a data frame. It’s important to hand over the header=None. Otherwise pandas assumes the first row is a column header.

df.describe gives us:

Because we want to reference the columns by name we add a column header

df.columns = ["original"]

To compare the nth cell with its n+1th cell neighbour be add a new column but shift the values

df['shifted'] = df['original'].shift(-1)

The output looks like this:

	original	shifted
0	159	158.0
1	158	174.0
2	174	196.0
3	196	197.0
4	197	194.0
…	…	…
1995	8538	8543.0
1996	8543	8545.0
1997	8545	8557.0
1998	8557	8568.0
1999	8568	NaN

We add another column where we place the value True when the value from the current row in the shifted column is bigger than in the original column:

df['increased'] = (df['shifted'] > df['original'])

Now it starts to look like the sample data from the introduction:

	original	shifted	increased
0	159	158.0	False
1	158	174.0	True
2	174	196.0	True
3	196	197.0	True
4	197	194.0	False
…	…	…	…
1995	8538	8543.0	True
1996	8543	8545.0	True
1997	8545	8557.0	True
1998	8557	8568.0	True
1999	8568	NaN	False

the last thing we have to do is counting how many times True occurs:

true_count = df['increased'].sum()

which gives us “1583”

This is a bit of a hack because it assumes that True equals 1 and False == 0

A more elegant solution is to use value_counts:

df['increased'].value_counts(dropna=False)

No the output is:

True     1583
False     417
Name: increased, dtype: int64

And 1583 is the number we are looking for. This earned us our first golden star and unlocked the second part of the challenge:

Part 2

The second part is a bit more challenging because we have to sum up three adjacent values and compare them to the next three values.

199  A       
200  A B     
208  A B C   
210    B C D
200  E   C D
207  E F   D
240  E F G
269    F G H
260      G H
263        H

I created a new notebook and started like part 1 with reading the data and naming the first column

import pandas as pd

df = pd.read_csv("./aoc_day_01_data.txt", header=None)
df.columns = ["original"]

To add the sum of three values to the row of the first value we use the following code

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()

This demonstrates the power of Pandas once more: you have integrated sliding window functions!

The rest is equal to part one “shift, compare and count”

df['shifted_rs'] = df['rolling_sum'].shift(-1)
df['increased_rs'] = (df['shifted_rs'] > df['rolling_sum'])
true_count = df['increased_rs'].sum()
true_count

As a little Fingerübung I did the same with vanilla Python:

data = []
with open("./aoc_day_01_test_data.txt") as f:
    for line in f:
        data.append(int(line.rstrip()))

triplet_sums = []

for i, v in enumerate(data):
    if i < (len(data) - 2):
        triplet_sum = data[i] + data[i+1] + data[i+2]
        triplet_sums.append(triplet_sum)
print(triplet_sums)

sums_larger_than_previous_sums = 0
for i, v in enumerate(triplet_sums):
    if i < (len(triplet_sums) - 1):
        if triplet_sums[i] < triplet_sums[i+1]:
            sums_larger_than_previous_sums += 1

print(sums_larger_than_previous_sums)

Which works but is less elegant.

Stay tuned for more!

The post 2021 – Advent of code – Day 1 appeared first on Creatronix.

k-fold crossvalidation with sklearn

Jörn — Fri, 05 Mar 2021 13:39:36 +0000

from sklearn.model_selection import KFold

kf = KFold(n_splits=2)

kf.split(df_train)

step = 0 # set counter to 0
for train_index, val_index in kf.split(df_train): # for each fold
    step = step + 1 # update counter

    print('Step ', step)

    features_fold_train = df_train.iloc[train_index, [4, 5]] # features matrix of training data (of this step)
    features_fold_val = df_train.iloc[val_index, [4, 5]] # features matrix of validation data (of this step) 

    target_fold_train = df_train.iloc[train_index, 6] # target vector of training data (of this step)
    target_fold_val = df_train.iloc[val_index, 6] # target vector of validation data (of this step) 

    print("VALIDATE:", val_index)
    print('Dimensions features matrix for validation: ', features_fold_val.shape)
    print("TRAIN:", train_index)
    print('Dimensions features matrix for training: ',features_fold_train.shape, '\n')

further info

The post k-fold crossvalidation with sklearn appeared first on Creatronix.

Pandas Cheat Sheet

Jörn — Fri, 05 Mar 2021 13:17:19 +0000

If you are new to Pandas feel free to read Introduction to Pandas

I’ve assembled some pandas code snippets

Reading Data

Reading CSV

import pandas as pd

# read from csv
df = pd.read_csv("path_to_file")

Can also be textfiles. file suffix is ignored

The default limiter for comma separated value files is the comma. If you have data with another delimiter you can specify it via:

delimiter=";"

If your data has no header you can pass header=None into the function

df = pd.read_csv("./aoc_day_01_data.txt", header=None)

With skiprows you can start reading in at any row

skiprows=8

Sometimes you need to alter the encoding as well:

encoding="cp1252"

Reading Excel

You can read excel files as well but you need to install

pip install openpyxl

df = pd.read_excel("./my_excel_sheet.xlsx")

With sheet_name you can select the individual sheet:

sheet_name="my_sheet_1"

Inspecting data

Basic information

df.describe()

Length

len(df)

showing entries

df.head()

df.tail(10)

Indexing

df['A']

gives you column A

iloc gives you entries based on numerical index

#      [row, column]
df.iloc[0,   0]

#     [row, column]
df.loc[:, :]

Data Cleaning

Dropping columns

del df["column_name"]

Renaming columns

df.columns = ["new_column_name", ...]

Comparing columns

df['increased'] = (df['shifted'] > df['original'])

Shifting columns

df['shifted'] = df['original'].shift(-1)

Splitting

Splitting strings into individual columns

df = pd.DataFrame(df["original"].str.split('').tolist())

Counting and Calculating

Summing columns

df["value"].sum()

Cumulative sum

df["aim"].cumsum()

Rolling sum

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()

Counting value occurence

df['increased'].value_counts(dropna=False)

Counting occurrences for all columns

df = pd.concat([df[column].value_counts() for column in df], axis = 1)

Convert column to datetime

df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])

Convert datetime to minutes since midnight

df_train.loc[:, 'msm'] = df_train.loc[:, "date"].dt.hour * 60 + df_train.loc[:, "date"].dt.minute

The post Pandas Cheat Sheet appeared first on Creatronix.

Data Science Pipeline

Jörn — Fri, 05 Mar 2021 13:16:11 +0000

Motivation

Learning Data Science can be grueling and overwhelming sometimes.

When I feel too overwhelmed it’s time to draw a picture.

This my current overview of what a data scientist has to do:

General tools

Linear Algebra with numpy

numpy random choice

Numpy linspace function

Data acquisiton

Data Science Datasets: Iris flower data set

Data cleaning

Pandas Cheat Sheet

Data exploration

Introduction to Pandas

Feature Scaling

Estimator fit

Linear Regression with sklearn cheat sheet

Lesson 2: Naive Bayes

Lesson 3: Support Vector Machines

Validation

What is Cross-Validation in Data Science?

k-fold crossvalidation with sklearn

Visualization

Introduction to matplotlib

Introduction to matplotlib – Part 2

Introduction to matplotlib – Part 3

Scatterplot with matplotlib

Interpretation

Metrics about the quality of your model

Classification: Precision and Recall

Receiver Operating Characteristic

Confusion Matrix

The post Data Science Pipeline appeared first on Creatronix.

Introduction to the Julia programming language

Jörn — Mon, 08 Jun 2020 12:21:22 +0000

Installation

You can download the installer at https://julialang.org/

On Windows you have to set the path manually, e.g. C:\Julia-1.3.1\bin

Let’s fire up a command line and type: julia

You will be greeted with a nice little ascii art and the prompt.

If You want to close the prompt You can use CTRL + D

Hello Julia

To get over with, here is the hello world

julia> show("Hello Julia")
"Hello Julia"

Julia has a show function which gives you decorated text interpretations

But there is also a print function

julia> print("Hello Julia")
Hello Julia

Writing Programs

julia code is put into files with the file extension .jl

so let’s write the hello julia to a file

We can start it via

julia hello-julia.jl

Reading and Writing Files

open("data.txt") do file
    for line in eachline(file)
        println(line)
    end
end

asdf

open("data.txt") do file
    for line in enumerate(eachline(file))
        println(line[1], ": ", line[2])
    end
end

Reading CSV

As a developer with interest in data science I want to know what packages exist to handle data. The first one is the csv package to deal with comma separated values.

julia> import Pkg; Pkg.add("CSV")

julia>using CSV
julia>csv = CSV.read("./my_cars.csv")

5×4 DataFrames.DataFrame
│ Row │ Make   │  Model  │  Bought │  Sold │
│     │ String │ String  │ Int64   │ Int64 │
├─────┼────────┼─────────┼─────────┼───────┤
│ 1   │ Fiat   │  Punto  │ 2005    │ 2009  │
│ 2   │ Ford   │  Escort │ 2009    │ 2010  │
│ 3   │ Ford   │  Fusion │ 2003    │ 2012  │
│ 4   │ Audi   │  A5     │ 2012    │ 2099  │
│ 5   │ Honda  │  Jazz   │ 2015    │ 2099  │

Plots

import Pkg; Pkg.add("Plots")
using Plots
x = 1:10; 
y = rand(10); 
plot(x, y)

The post Introduction to the Julia programming language appeared first on Creatronix.

0	1	2	3	4	5
0	0	0	1	0	0
1	1	1	1	1	0
2	1	0	1	1	0
3	1	0	1	1	1
4	1	0	1	0	1
5	0	1	1	1	1
6	0	0	1	1	1
7	1	1	1	0	0
8	1	0	0	0	0
9	1	1	0	0	1
10	0	0	0	1	0
11	0	1	0	1	0

0	1	2	3	4	5
0	0	0	1	0	0
1	1	1	1	1	0
2	1	0	1	1	0
3	1	0	1	1	1
4	1	0	1	0	1
5	0	1	1	1	1
6	0	0	1	1	1
7	1	1	1	0	0
8	1	0	0	0	0
9	1	1	0	0	1
10	0	0	0	1	0
11	0	1	0	1	0

Data Science & SQL Archives - Creatronix

Qualitätsmetriken bei binärer Klassifikation

Motivation

Begriffe

Beispiel

Metriken

Precision

Recall

Accuracy

F1-Score

Fazit

How to deal with date and time in SQLite

Motivation

SQL table creation

Python code

Further Reading

SQL-Tutorial

2021 – Advent of code – Day 3

Reading the data

2021 – Advent of code – Day 2

Part 1

Part 2

Update

2021 – Advent of code – Day 1

What is advent of code?

Day 1 Challenge – Part 1

Part 2

k-fold crossvalidation with sklearn

Pandas Cheat Sheet

Reading Data

Reading CSV

Reading Excel

Inspecting data

Basic information

Length

showing entries

Indexing

Data Cleaning

Dropping columns

Renaming columns

Comparing columns

Shifting columns

Splitting

Splitting strings into individual columns

Counting and Calculating

Summing columns

Cumulative sum

Rolling sum

Counting value occurence

Counting occurrences for all columns

Convert column to datetime

Convert datetime to minutes since midnight

Data Science Pipeline

Motivation

General tools

Data acquisiton

Data cleaning

Data exploration

Feature Scaling

Estimator fit

Validation

Visualization

Interpretation

Introduction to the Julia programming language

Installation

Hello Julia

Writing Programs

Reading and Writing Files

Reading CSV

Plots

0	1	2	3	4	5
0	0	0	1	0	0
1	1	1	1	1	0
2	1	0	1	1	0
3	1	0	1	1	1
4	1	0	1	0	1
5	0	1	1	1	1
6	0	0	1	1	1
7	1	1	1	0	0
8	1	0	0	0	0
9	1	1	0	0	1
10	0	0	0	1	0
11	0	1	0	1	0