Data Science & SQL Archives - Page 3 of 5

Feature Scaling

Data Science & SQLBy Jörn October 5, 2018

What is Feature Scaling? Feature Scaling is an important pre-processing step for some machine learning algorithms. Imagine you have three friends of whom you know the individual weight and height. You would like to deduce Christian’s t-shirt size from David’s and Julia’s by looking at the height and weight. Name Height in m Weight in…

Receiver Operating Characteristic

Data Science & SQLBy Jörn August 4, 2018

ROC Curve As we already introduced Precision and Recall the ROC curve is another way of looking at the quality of classification algorithms. ROC stands for Receiver Operating Characteristic The ROC curve is created by plotting the true positive rate (TPR) on the y-axis against the false positive rate (FPR) on the x-axis at various…

Introduction to Pandas

Data Science & SQL, PythonBy Jörn August 1, 2018

Pandas is a data analyzing tool. Together with numpy and matplotlib it is part of the data science stack You can install it via pip install pandas Working with real data The data set we are using is the astronauts data set from kaggle: Download Data Set NASA Astronauts from Kaggle During this introduction we…

Creating a Curriculum Vitae with Jupyter, Pandas and Matplotlib

Data Science & SQLBy Jörn July 26, 2018 Leave a comment

Applying for a data scientist job offer? Tired of writing the same old curriculum vitae? Why not showing your data visualization skills directly in your application? Generate Data Instead of pressing your data about education, employment and skills in a word-like document, put it in tables instead. E.g. use open office to create and edit…

Intro to OpenCV with Python

Data Science & SQLBy Jörn July 23, 2018

Installation To work with OpenCV from python, you need to install it first. We additionally install numpy and matplotlib as well pip install opencv-python numpy matplotlib Reading Images from file After we import cv2 we can directly work with images like so: import cv2 img = cv2.imread(“doc_brown.png”) For showing the image, it is recommended to…

Confusion Matrix

Data Science & SQLBy Jörn July 3, 2018

Too confused of the confusion matrix? Let me bring some clarity into this topic! Let’s take the example from Precision and Recall: y_true = [“dog”, “dog”, “non-dog”, “non-dog”, “dog”, “dog”] y_pred = [“dog”, “non-dog”, “dog”, “non-dog”, “dog”, “non-dog”] When we look at the prediction we can count the correct and incorrect classifications: dog correctly classified…

numpy random choice

Data Science & SQL, PythonBy Jörn July 3, 2018

With numpy you can easily create test data with random_integers and randint. numpy.random.randint(low, high=None, size=None, dtype=’l’) numpy.random.random_integers(low, high=None, size=None) random_integers includes the high boundary while randint does not. >>> import numpy as np >>> np.random.random_integers(5) 4 >>> np.random.random_integers(5, size=(5)) array([5, 3, 4, 1, 4]) >>> np.random.random_integers(5, size=(5, 4)) array([[2, 3, 3, 5], [1, 3, 1,…

Classification: Precision and Recall

Data Science & SQLBy Jörn June 28, 2018

In the realms of Data Science you’ll encounter sooner or the later the terms “Precision” and “Recall”. But what do they mean? Clarification Living together with little kids You very often run into classification issues: My daughter really likes dogs, so seeing a dog is something positive. When she sees a normal dog e.g. a…

Lesson 2: Naive Bayes

Data Science & SQL, PythonBy Jörn June 19, 2018

Lesson 2 of the Udacity Course UD120 – Intro to Machine Learning deals with Naive Bayes classification. Mini project For the mini project you should fork https://github.com/udacity/ud120-projects and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of…

Lesson 3: Support Vector Machines

Data Science & SQLBy Jörn June 6, 2018

The term Support Vector Machines or SVM is a bit misleading. It is just a name for a very clever algorithm invented by two Russians. in the 1960s. SVMs are used for classification and regression. SVM do that by finding a hyperplane between two classes of data which separates both classes best. print(“Start training”)…

Category Archives: Data Science & SQL