When you finished reading part 1 of the introduction you might have wondered how to draw more than one line or curve into on plot. I will show you now.
To make it a bit more interesting we generate two functions: sine and cosine. We generate our x-values with numpy’s linspace function Continue reading “Introduction to matplotlib – Part 2”
In my article My personal road map for learning data science in 2018 I wrote about how I try to tackle the data science knowledge sphere. Due to the fact that 2018 is slowly coming to an end I think it is time for a little wrap up.
What are the things I learned about Data Science in 2018? Here we go:
1. The difference between Data Science, Machine Learning, Deep Learning and AI
Continue reading “10 things I didn’t know about Data Science a year ago”
matplotlib is the workhorse of data science visualization. The module pyplot gives us MATLAB like plots.
The most basic plot is done with the “plot”-function. It looks like this:
Continue reading “Introduction to matplotlib”
What is Feature Scaling?
Feature Scaling is an important pre-processing step for some machine learning algorithms.
Imagine you have three friends of whom you know the individual weight and height.
You would like to deduce Chris’ T-shirt size from Cameron’s and Sarah’s by looking at the height and weight.
||Height in m
||Weight in kg
One way You could determine the shirt size is to just add up the weight and the height of each friend. You would get: Continue reading “Lesson 10: Feature Scaling”
As we already introduced Precision and Recall the ROC curve is another way of looking at the quality of classification algorithms.
ROC stands for Receiver Operating Characteristic
The ROC curve is created by plotting the true positive rate (TPR) on the y-axis against the false positive rate (FPR) on the x-axis at various threshold settings.
You already know the TPR as recall or sensitivity.
The false positive rate is defined as FPR = FP / (FP + TN)
ROC curves have a big advantage: they are insensitive to changes in class distribution.
from sklearn.metrics import roc_curve
Pandas is a data analyzing tool. Together with numpy and matplotlib it is part of the data science stack
You can install it via
pip install pandas
Working with real data
The data set we are using is the astronauts data set from kaggle:
Continue reading “Data Science: Pandas”
Applying for a data scientist job offer? Tired of writing the same old curriculum vitae?
Why not showing your data visualization skills directly in your application?
Continue reading “Curriculum Vitae for Data Scientists”
To work with OpenCV from python, you need to install it first:
pip install opencv-python
After we import cv2 we can directly work with images like so:
img = cv2.imread("doc_brown.png")
Continue reading “Intro to OpenCV with Python”
Too confused of the confusion matrix?
Let me bring some clarity into this topic!
Continue reading “Confusion Matrix”
With numpy you can easily create test data with random_integers and randint.
numpy.random.randint(low, high=None, size=None, dtype='l')
numpy.random.random_integers(low, high=None, size=None)
random_integers includes the high boundary while randint does not. Continue reading “numpy random choice”