<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sklearn Archives - Creatronix</title>
	<atom:link href="https://creatronix.de/tag/sklearn/feed/" rel="self" type="application/rss+xml" />
	<link>https://creatronix.de/tag/sklearn/</link>
	<description>My adventures in code &#38; business</description>
	<lastBuildDate>Thu, 09 Oct 2025 14:52:02 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Linear Regression with sklearn &#8211; cheat sheet</title>
		<link>https://creatronix.de/linear-regression-with-sklearn-cheat-sheet/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Tue, 04 Feb 2020 13:01:14 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[cheat sheet]]></category>
		<category><![CDATA[Linear Regression]]></category>
		<category><![CDATA[linear_model]]></category>
		<category><![CDATA[sklearn]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=3097</guid>

					<description><![CDATA[<p># import and instantiate model from sklearn.linear_model import LinearRegression model = LinearRegression() #prepare test data features_train = df_train.loc[:, 'feature_name'] target_train = df_train.loc[:, 'target_name'] #fit (train) model and print coefficient and intercept model.fit(features_train , target_train ) print(model.coef_) print(model.intercept_) # calculate model quality from sklearn.metrics import mean_squared_error from sklearn.metrics import r2_score target_prediction = model.predict(features_train) print(mean_squared_error(target_train , target_prediction))&#8230;</p>
<p>The post <a href="https://creatronix.de/linear-regression-with-sklearn-cheat-sheet/">Linear Regression with sklearn &#8211; cheat sheet</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<pre># import and instantiate model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

#prepare test data
features_train = df_train.loc[:, 'feature_name']
target_train = df_train.loc[:, 'target_name']

#fit (train) model and print coefficient and intercept
model.fit(features_train , target_train )
print(model.coef_)
print(model.intercept_)

# calculate model quality
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

target_prediction = model.predict(features_train)
print(mean_squared_error(target_train , target_prediction))
print(r2_score(target_train , target_prediction))

# test predictions
features_test = df_train.loc[:, 'feature_name'] 
target_test = df_train.loc[:, 'target_name']
target_prediction_test = model.predict(features_test) 
print(mean_squared_error(target_test, target_prediction_test )) 
print(r2_score(target_test, target_prediction_test ))</pre>
<p>The post <a href="https://creatronix.de/linear-regression-with-sklearn-cheat-sheet/">Linear Regression with sklearn &#8211; cheat sheet</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Confusion Matrix</title>
		<link>https://creatronix.de/confusion-matrix/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Tue, 03 Jul 2018 10:51:38 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[confusion matrix]]></category>
		<category><![CDATA[dog]]></category>
		<category><![CDATA[false positive]]></category>
		<category><![CDATA[sklearn]]></category>
		<category><![CDATA[true positive]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1688</guid>

					<description><![CDATA[<p>Too confused of the confusion matrix? Let me bring some clarity into this topic! Let&#8217;s take the example from Precision and Recall: y_true = ["dog", "dog", "non-dog", "non-dog", "dog", "dog"] y_pred = ["dog", "non-dog", "dog", "non-dog", "dog", "non-dog"] When we look at the prediction we can count the correct and incorrect classifications: dog correctly classified&#8230;</p>
<p>The post <a href="https://creatronix.de/confusion-matrix/">Confusion Matrix</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Too confused of the confusion matrix?</h2>
<p>Let me bring some clarity into this topic!</p>
<p>Let&#8217;s take the example from <a href="https://creatronix.de/classification-precision-and-recall/">Precision and Recall:</a></p>
<pre>y_true = ["dog", "dog",     "non-dog", "non-dog", "dog", "dog"]
y_pred = ["dog", "non-dog", "dog",     "non-dog", "dog", "non-dog"]</pre>
<p>When we look at the prediction we can count the correct and incorrect classifications:</p>
<ul>
<li>dog correctly classified as dog: 2 times (True Positive)</li>
<li>non-dog incorrectly classified as dog: 1 time (False Positive)</li>
<li>dog incorrectly classified as non-dog: 2 times (False Negative)</li>
<li>non-dog correctly classified as non-dog: 1 time (True Negative)</li>
</ul>
<p>When we visualize these results in a matrix we already have the confusion matrix:<br />
<img fetchpriority="high" decoding="async" class="alignnone size-full wp-image-1690" src="https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix.png" alt="" width="479" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix.png 479w, https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix-150x150.png 150w, https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix-300x300.png 300w, https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix-100x100.png 100w" sizes="(max-width: 479px) 100vw, 479px" /></p>
<h2>sklearn</h2>
<p>We can calculate the confusion matrix with sklearn in a very simple manner</p>
<pre>from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_true, y_pred, labels=["dog", "non-dog"]))
</pre>
<p>the output is:</p>
<pre>[[2 2]
[1 1]]</pre>
<p>which can be indeed confusing because the matrix is transposed. In contrast to our matrix from above the columns are the prediction and the rows are the actual values:</p>
<p><img decoding="async" class="alignnone size-full wp-image-1692" src="https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix_2.png" alt="" width="479" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix_2.png 479w, https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix_2-150x150.png 150w, https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix_2-300x300.png 300w, https://creatronix.de/wp-content/uploads/2018/07/confusion_matrix_2-100x100.png 100w" sizes="(max-width: 479px) 100vw, 479px" /></p>
<p>And that&#8217;s all &#8211; if you just have a binary classifier.</p>
<h2>Multi-label classifier</h2>
<p>So what happens, when your classifier can decide between three outcomes, say dog, cat and rabbit? (You can generate the test data with <a href="https://creatronix.de/numpy-random-choice/">numpy random choice</a>)</p>
<pre>y_true = ['rabbit', 'dog', 'rabbit', 'cat', 'cat', 'cat', 'cat', 'dog', 'cat']
y_pred = ['rabbit', 'rabbit', 'dog', 'cat', 'dog', 'rabbit', 'dog', 'cat', 'dog']

cm = confusion_matrix(y_true, y_pred, labels=["dog", "rabbit", "cat"])</pre>
<pre>[[0 1 1]
[1 1 0]
[3 1 1]]</pre>
<p>The post <a href="https://creatronix.de/confusion-matrix/">Confusion Matrix</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Data Science Datasets: Iris flower data set</title>
		<link>https://creatronix.de/data-science-datasets-iris-flower-data-set/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Wed, 25 Apr 2018 08:55:12 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[iris flower data set]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[sklearn]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1373</guid>

					<description><![CDATA[<p>Motivation When you are going to learn some data science the aquisition of data is often the first step. To get you started scikit-learn comes with a bunch of so called &#8220;toy datasets&#8221;. One of them is the Iris dataset. Prerequisites &#38; Imports Besides scikit-learn we will use pandas for data handling and matplotlib with&#8230;</p>
<p>The post <a href="https://creatronix.de/data-science-datasets-iris-flower-data-set/">Data Science Datasets: Iris flower data set</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Motivation</h2>
<p>When you are going to learn some data science the aquisition of data is often the first step.</p>
<p>To get you started scikit-learn comes with a bunch of so called &#8220;toy datasets&#8221;. One of them is the Iris dataset.</p>
<h2>Prerequisites &amp; Imports</h2>
<p>Besides scikit-learn we will use <a href="https://creatronix.de/introduction-to-pandas/">pandas</a> for data handling and <a href="https://creatronix.de/introduction-to-matplotlib/">matplotlib</a> with seaborn for visualization. So let&#8217;s install them:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install scikit-learn pandas seaborn matplotlib</code></pre>
</div>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>from sklearn import datasets
import seaborn as sns
import pandas as pd
sns.set_palette('husl')
import matplotlib.pyplot as plt
%matplotlib inline</code></pre>
</div>
<h2>Iris data set</h2>
<p>The Iris flower data set or Fisher&#8217;s Iris data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines.</p>
<p>It is sometimes called Anderson&#8217;s <i>Iris</i> data set because Edgar Anderson collected the data to quantify the morphological variation of <i>Iris</i> flowers of three related species.</p>
<p>This data set can be imported from scikit-learn like the following:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>iris = datasets.load_iris() 
</code></pre>
</div>
<div>
<h2>Convert to Pandas Dataframe</h2>
</div>
<p>To work with the dataset we convert it into a pandas dataframe.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>df = pd.DataFrame(
    iris['data'],
    columns=iris['feature_names']
)
df['species'] = iris['target']
df['species'] = df['species'].map({
    0 : 'Iris-setosa',
    1 : 'Iris-versicolor',
    2 : 'Iris-virginica'
})</code></pre>
</div>
<div>
<h2>Data visualization</h2>
<p>Seaborn has a nice way to visualize data for exploration with the pariplot function.</p>
<p>It takes every feature and compares it pairwise with every other feature</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>g = sns.pairplot(df, hue='species', markers='+')
plt.show()</code></pre>
</div>
</div>
<h2><img decoding="async" src="https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-1024x888.png" alt="" class="alignnone size-large wp-image-5972" width="1024" height="888" srcset="https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-1024x888.png 1024w, https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-300x260.png 300w, https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-768x666.png 768w, https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot.png 1137w" sizes="(max-width: 1024px) 100vw, 1024px" /></h2>
<h2>Further Reading</h2>
<p><a href="https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset">https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset</a></p>
<p><a href="https://www.kaggle.com/code/jchen2186/machine-learning-with-iris-dataset">https://www.kaggle.com/code/jchen2186/machine-learning-with-iris-dataset</a></p>
<p><a href="https://creatronix.de/introduction-to-jupyter-notebook/">Introduction to Jupyter Notebook</a></p>
<p><a href="https://creatronix.de/introduction-to-pandas/">Introduction to Pandas</a></p>
<p><a href="https://creatronix.de/pandas-cheat-sheet/">Pandas Cheat Sheet</a></p>
<p><a href="https://creatronix.de/introduction-to-matplotlib/">Introduction to matplotlib</a></p>
<p>The post <a href="https://creatronix.de/data-science-datasets-iris-flower-data-set/">Data Science Datasets: Iris flower data set</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
