sklearn Archives - Creatronix

Linear Regression with sklearn – cheat sheet

Jörn — Tue, 04 Feb 2020 13:01:14 +0000

# import and instantiate model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

#prepare test data
features_train = df_train.loc[:, 'feature_name']
target_train = df_train.loc[:, 'target_name']

#fit (train) model and print coefficient and intercept
model.fit(features_train , target_train )
print(model.coef_)
print(model.intercept_)

# calculate model quality
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

target_prediction = model.predict(features_train)
print(mean_squared_error(target_train , target_prediction))
print(r2_score(target_train , target_prediction))

# test predictions
features_test = df_train.loc[:, 'feature_name'] 
target_test = df_train.loc[:, 'target_name']
target_prediction_test = model.predict(features_test) 
print(mean_squared_error(target_test, target_prediction_test )) 
print(r2_score(target_test, target_prediction_test ))

The post Linear Regression with sklearn – cheat sheet appeared first on Creatronix.

Confusion Matrix

Jörn — Tue, 03 Jul 2018 10:51:38 +0000

Too confused of the confusion matrix?

Let me bring some clarity into this topic!

Let’s take the example from Precision and Recall:

y_true = ["dog", "dog",     "non-dog", "non-dog", "dog", "dog"]
y_pred = ["dog", "non-dog", "dog",     "non-dog", "dog", "non-dog"]

When we look at the prediction we can count the correct and incorrect classifications:

dog correctly classified as dog: 2 times (True Positive)
non-dog incorrectly classified as dog: 1 time (False Positive)
dog incorrectly classified as non-dog: 2 times (False Negative)
non-dog correctly classified as non-dog: 1 time (True Negative)

When we visualize these results in a matrix we already have the confusion matrix:

sklearn

We can calculate the confusion matrix with sklearn in a very simple manner

from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_true, y_pred, labels=["dog", "non-dog"]))

the output is:

[[2 2]
[1 1]]

which can be indeed confusing because the matrix is transposed. In contrast to our matrix from above the columns are the prediction and the rows are the actual values:

And that’s all – if you just have a binary classifier.

Multi-label classifier

So what happens, when your classifier can decide between three outcomes, say dog, cat and rabbit? (You can generate the test data with numpy random choice)

y_true = ['rabbit', 'dog', 'rabbit', 'cat', 'cat', 'cat', 'cat', 'dog', 'cat']
y_pred = ['rabbit', 'rabbit', 'dog', 'cat', 'dog', 'rabbit', 'dog', 'cat', 'dog']

cm = confusion_matrix(y_true, y_pred, labels=["dog", "rabbit", "cat"])

[[0 1 1]
[1 1 0]
[3 1 1]]

The post Confusion Matrix appeared first on Creatronix.

Data Science Datasets: Iris flower data set

Jörn — Wed, 25 Apr 2018 08:55:12 +0000

Motivation

When you are going to learn some data science the aquisition of data is often the first step.

To get you started scikit-learn comes with a bunch of so called “toy datasets”. One of them is the Iris dataset.

Prerequisites & Imports

Besides scikit-learn we will use pandas for data handling and matplotlib with seaborn for visualization. So let’s install them:

pip install scikit-learn pandas seaborn matplotlib

from sklearn import datasets
import seaborn as sns
import pandas as pd
sns.set_palette('husl')
import matplotlib.pyplot as plt
%matplotlib inline

Iris data set

The Iris flower data set or Fisher’s Iris data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines.

It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphological variation of Iris flowers of three related species.

This data set can be imported from scikit-learn like the following:

iris = datasets.load_iris()

Convert to Pandas Dataframe

To work with the dataset we convert it into a pandas dataframe.

df = pd.DataFrame(
    iris['data'],
    columns=iris['feature_names']
)
df['species'] = iris['target']
df['species'] = df['species'].map({
    0 : 'Iris-setosa',
    1 : 'Iris-versicolor',
    2 : 'Iris-virginica'
})

Data visualization

Seaborn has a nice way to visualize data for exploration with the pariplot function.

It takes every feature and compares it pairwise with every other feature

g = sns.pairplot(df, hue='species', markers='+')
plt.show()