<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>data science Archives - Creatronix</title>
	<atom:link href="https://creatronix.de/tag/data-science/feed/" rel="self" type="application/rss+xml" />
	<link>https://creatronix.de/tag/data-science/</link>
	<description>My adventures in code &#38; business</description>
	<lastBuildDate>Fri, 27 Feb 2026 11:15:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>10 things I didn&#8217;t know about Data Science a year ago</title>
		<link>https://creatronix.de/10-things-i-didnt-know-about-data-science-a-year-ago/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Mon, 12 Nov 2018 08:42:26 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Bayes theorem]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[naive bayes]]></category>
		<category><![CDATA[numpy]]></category>
		<category><![CDATA[opencv]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=2269</guid>

					<description><![CDATA[<p>In my article My personal road map for learning data science in 2018 I wrote about how I try to tackle the data science knowledge sphere. Due to the fact that 2018 is slowly coming to an end I think it is time for a little wrap up. What are the things I learned about&#8230;</p>
<p>The post <a href="https://creatronix.de/10-things-i-didnt-know-about-data-science-a-year-ago/">10 things I didn&#8217;t know about Data Science a year ago</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>In my article <a href="https://creatronix.de/my-personal-road-map-for-learning-data-science/">My personal road map for learning data science in 2018</a> I wrote about how I try to tackle the data science knowledge sphere. Due to the fact that 2018 is slowly coming to an end I think it is time for a little wrap up.</p>
<p>What are the things I learned about Data Science in 2018? Here we go:</p>
<h2>The difference between Data Science, Machine Learning, Deep Learning and AI</h2>
<p><img fetchpriority="high" decoding="async" class="alignnone size-full wp-image-2276" src="https://creatronix.de/wp-content/uploads/2018/10/data_science_vs_ml.png" alt="" width="514" height="392" srcset="https://creatronix.de/wp-content/uploads/2018/10/data_science_vs_ml.png 514w, https://creatronix.de/wp-content/uploads/2018/10/data_science_vs_ml-300x229.png 300w" sizes="(max-width: 514px) 100vw, 514px" /></p>
<p>A picture says more than a thousand words.</p>
<h2>The difference between supervised and unsupervised learning</h2>
<p><em>Supervised Learning</em></p>
<p>You have training and test data with <strong>labels</strong>. Labels tell You to which e.g. class a certain data item belongs. Image you have images of pets and the labels are the name of the pets.</p>
<p><em>Unsupervised Learning</em></p>
<p>Your data doesn’t have labels. Your algorithm e.g. k-means clustering need to figure out a structure given only the data</p>
<h2>The areas of applied machine learning</h2>
<p>are described here: <a href="https://creatronix.de/the-essence-of-machine-learning/">The Essence of Machine Learning </a>and <a href="https://creatronix.de/data-science-overview/">Data Science Overview</a></p>
<h2>Bayes Theorem</h2>
<p>In my article <a href="https://creatronix.de/bayes-theorem/">Bayes theorem</a> I elaborated about the <strong>base rate fallacy </strong>and in <a href="https://creatronix.de/lesson-2-naive-bayes/">naive bayes</a> I recapped the second lesson from udacity&#8217;s <a href="https://creatronix.de/ud120-intro-to-machine-learning/">UD120 Intro to Machine Learning</a></p>
<h2>Precision and Recall and ROC</h2>
<p>In my article <a href="https://creatronix.de/classification-precision-and-recall/">classification: precision and recall</a> I wrote about different useful measures to evaluate the quality of a supervised learning algorithm.</p>
<p>In <a href="https://creatronix.de/receiver-operating-characteristic/">Receiver Operating Characteristic</a> I wrote about another useful measures the ROC.</p>
<h2>Visualization with matplotlib</h2>
<p>Matplotlib is a really good starting point for visualization. I wrote about it in <a href="https://creatronix.de/introduction-to-matplotlib/">Introduction to matplotlib</a>, <a href="https://creatronix.de/introduction-to-matplotlib-part-2/">Matplotlib &#8211; Part 2</a>, <a href="https://creatronix.de/scatterplot-with-matplotlib/">Scatterplot with matplotlib</a></p>
<h2>Math with numpy</h2>
<p>I wrote some articles about the usage of numpy but only scraped the surface of this mighty library</p>
<ul>
<li><a href="https://creatronix.de/linear-algebra-with-numpy-part-1/">Linear Algebra with numpy &#8211; Part 1</a></li>
<li><a href="https://creatronix.de/numpy-random-choice/">numpy random choice</a></li>
<li><a href="https://creatronix.de/numpy-linspace-function/">Numpy linspace function</a></li>
</ul>
<h2>Image manipulation with OpenCV</h2>
<p><a href="https://creatronix.de/intro-to-opencv-with-python/">Intro to OpenCV with Python</a></p>
<h2>JuPyter Notebooks</h2>
<p>Sometimes I love them sometimes I hate them. I wrote an <a href="https://creatronix.de/introduction-to-jupyter-notebook/">Introduction to JuPyter Notebook</a></p>
<h2>Podcasts</h2>
<p>In 2018 I&#8217;ve listened to a bunch of great podcasts on iTunes:</p>
<ul>
<li><a href="https://lineardigressions.com/">Linear digressions</a></li>
<li><a href="https://lexfridman.com/ai/">MIT Lex Fridman</a></li>
<li><a href="https://itunes.apple.com/de/podcast/self-driving-cars-dr-lance-eliot-podcast-series/id1330558096?mt=2">Dr. Lance Eliot</a></li>
</ul>
<p>&nbsp;</p>
<p>The post <a href="https://creatronix.de/10-things-i-didnt-know-about-data-science-a-year-ago/">10 things I didn&#8217;t know about Data Science a year ago</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Classification: Precision and Recall</title>
		<link>https://creatronix.de/classification-precision-and-recall/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Thu, 28 Jun 2018 15:10:00 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[accuracy]]></category>
		<category><![CDATA[cat]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[dog]]></category>
		<category><![CDATA[f1 score]]></category>
		<category><![CDATA[false positive]]></category>
		<category><![CDATA[precision]]></category>
		<category><![CDATA[recall]]></category>
		<category><![CDATA[true positive]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1649</guid>

					<description><![CDATA[<p>In the realms of Data Science you&#8217;ll encounter sooner or the later the terms &#8220;Precision&#8221; and &#8220;Recall&#8221;. But what do they mean? Clarification Living together with little kids You very often run into classification issues: My daughter really likes dogs, so seeing a dog is something positive. When she sees a normal dog e.g. a&#8230;</p>
<p>The post <a href="https://creatronix.de/classification-precision-and-recall/">Classification: Precision and Recall</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>In the realms of Data Science you&#8217;ll encounter sooner or the later the terms &#8220;Precision&#8221; and &#8220;Recall&#8221;. But what do they mean?</p>
<p><img decoding="async" id="im" src="https://i.imgflip.com/3opnpn.jpg" alt="Two Buttons Meme | Recall; Precision | image tagged in memes,two buttons | made w/ Imgflip meme maker" /></p>
<h2>Clarification</h2>
<p>Living together with little kids You very often run into classification issues:</p>
<p>My daughter really likes dogs, so seeing a dog is something positive. When she sees a normal dog e.g. a Labrador and proclaims: &#8220;Look, there is a dog!&#8221;</p>
<p>That&#8217;s a <strong>True Positive (TP)</strong></p>
<p>If she now sees a fat cat and proclaims: &#8220;Look at the dog!&#8221; we call it a <strong>False Positive (FP)</strong>, because her assumption of a positive outcome (a dog!) was false. A false positive is also called a Type 1 error</p>
<p>If I point at a small dog e.g. a Chihuahua and say &#8220;Look at the dog!&#8221; and she cries: &#8220;This is not a dog!&#8221; but indeed it is one, we call that a <strong>False negatives (FN) </strong>A false negativeis also called a Type 2 error</p>
<p>And last but not least, if I show her a bird and we agree on the bird not being a dog we have a <strong>True Negative (TN)</strong></p>
<p>This neat little matrix shows all of them in context:<br />
<img decoding="async" class="alignnone size-full wp-image-1669" src="https://creatronix.de/wp-content/uploads/2018/06/precision_and_recall.png" alt="" width="479" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/06/precision_and_recall.png 479w, https://creatronix.de/wp-content/uploads/2018/06/precision_and_recall-150x150.png 150w, https://creatronix.de/wp-content/uploads/2018/06/precision_and_recall-300x300.png 300w, https://creatronix.de/wp-content/uploads/2018/06/precision_and_recall-100x100.png 100w" sizes="(max-width: 479px) 100vw, 479px" /></p>
<h2>Precision and Recall</h2>
<p>If I show my daughter twenty pictures of cats and dogs (8 cat pictures and 12 dog pictures) and she identifies 10 as dogs but out of ten dogs there are actually 2 cats her precision is 8 / (8+2) = 4/5 or 80%.</p>
<p><strong>Precision = <span style="color: #ff0000;">TP</span> / (<span style="color: #339966;">TP + FP</span>)</strong></p>
<p><img decoding="async" class="alignnone size-full wp-image-1672" src="https://creatronix.de/wp-content/uploads/2018/06/precision.png" alt="" width="479" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/06/precision.png 479w, https://creatronix.de/wp-content/uploads/2018/06/precision-150x150.png 150w, https://creatronix.de/wp-content/uploads/2018/06/precision-300x300.png 300w, https://creatronix.de/wp-content/uploads/2018/06/precision-100x100.png 100w" sizes="(max-width: 479px) 100vw, 479px" /></p>
<p>Knowing that there are actually 12 dog pictures and she misses 4 (false negatives) her recall is 8 / (8+4) = 2/3 or roughly 67%</p>
<p><strong>Recall = <span style="color: #ff0000;">TP</span> / (<span style="color: #339966;">TP + FN</span>)</strong></p>
<p><img decoding="async" class="alignnone size-full wp-image-1673" src="https://creatronix.de/wp-content/uploads/2018/06/recall.png" alt="" width="479" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/06/recall.png 479w, https://creatronix.de/wp-content/uploads/2018/06/recall-150x150.png 150w, https://creatronix.de/wp-content/uploads/2018/06/recall-300x300.png 300w, https://creatronix.de/wp-content/uploads/2018/06/recall-100x100.png 100w" sizes="(max-width: 479px) 100vw, 479px" /></p>
<p>Which measure is more important?</p>
<p>It depends:</p>
<p>If you&#8217;re a dog lover it is better to have a high precision, when you are afraid of dogs say to avoid dogs, a higher recall is better 🙂</p>
<h3>Different terms</h3>
<p>Precision is also called <strong>Positive Predictive Value (PPV)</strong></p>
<p>Recall often is also called</p>
<ul>
<li>True positive rate</li>
<li>Sensitivity</li>
<li>Probability of detection</li>
</ul>
<h2>Other interesting measures</h2>
<h2>Accuracy</h2>
<p><strong>ACC = (<span style="color: #ff0000;">TP + TN</span>) / (<span style="color: #339966;">TP + FP + TN + FN</span>)</strong></p>
<p><img decoding="async" class="alignnone size-full wp-image-1674" src="https://creatronix.de/wp-content/uploads/2018/06/accuracy.png" alt="" width="479" height="480" srcset="https://creatronix.de/wp-content/uploads/2018/06/accuracy.png 479w, https://creatronix.de/wp-content/uploads/2018/06/accuracy-150x150.png 150w, https://creatronix.de/wp-content/uploads/2018/06/accuracy-300x300.png 300w, https://creatronix.de/wp-content/uploads/2018/06/accuracy-100x100.png 100w" sizes="(max-width: 479px) 100vw, 479px" /></p>
<h3>F1-Score</h3>
<p>You can combine Precision and Recall to a measure called F1-Score. It is the harmonic mean of precision and recall</p>
<p><strong>F1 = 2 / (1/Precision + 1/Recall)</strong></p>
<h3>Scikit-Learn</h3>
<p>scikit-learn being a one-stop-shop for data scientists does of course offer functions for calculating precision and recall:</p>
<pre>from sklearn.metrics import precision_score

y_true = ["dog", "dog", "not-a-dog", "not-a-dog", "dog", "dog"]
y_pred = ["dog", "not-a-dog", "dog", "not-a-dog", "dog", "not-a-dog"]

print(precision_score(y_true, y_predicted , pos_label="dog"))</pre>
<p>Let&#8217;s assume we trained a binary classifier which can tell us &#8220;dog&#8221; or &#8220;not-a-dog&#8221;</p>
<p>In this example the precision is 0.666 or ~67% because in two third of the cases the algorithm was right when it predicted a dog</p>
<pre>from sklearn.metrics import recall_score

print(recall_score(y_true, y_pred, pos_label="dog"))</pre>
<p>The recall was just 0.5 or 50% because out of 4 dogs it just identified 2  correctly as dogs.</p>
<pre>from sklearn.metrics import accuracy_score

print(accuracy_score(y_true, y_pred))</pre>
<p>The accuracy was also just 50% because out of 6 items it made only 3 correct predictions.</p>
<pre>from sklearn.metrics import f1_score

print(f1_score(y_true, y_pred, pos_label="dog"))</pre>
<p>The F1 score is 0.57 &#8211; just between 0.5 and 0.666.</p>
<p>What other scores do you encounter? &#8211; stay tuned for the next episode 🙂</p>
<p>&nbsp;</p>
<p>The post <a href="https://creatronix.de/classification-precision-and-recall/">Classification: Precision and Recall</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Lesson 2: Naive Bayes</title>
		<link>https://creatronix.de/lesson-2-naive-bayes/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Tue, 19 Jun 2018 06:29:16 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[naive bayes]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1388</guid>

					<description><![CDATA[<p>Lesson 2 of the Udacity Course UD120 &#8211; Intro to Machine Learning deals with Naive Bayes classification. Mini project For the mini project you should fork https://github.com/udacity/ud120-projects and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of&#8230;</p>
<p>The post <a href="https://creatronix.de/lesson-2-naive-bayes/">Lesson 2: Naive Bayes</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Lesson 2 of the Udacity Course UD120 &#8211; Intro to Machine Learning deals with Naive Bayes classification.</p>
<h2>Mini project</h2>
<p>For the mini project you should fork <a href="https://github.com/udacity/ud120-projects">https://github.com/udacity/ud120-projects</a> and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of memory.</p>
<h3>Dependecies</h3>
<p>After cloning the repo I would recommend setting up a venv and install the requirements:</p>
<ul>
<li>sklearn</li>
<li>numpy</li>
<li>scipy</li>
<li>matplotlib</li>
</ul>
<h3>The Code</h3>
<p>The code itself is pretty straightforward:</p>
<ul>
<li>Instantiate the classifier</li>
<li>Train (fit) the Classifier</li>
<li>Predict</li>
<li>Calculate accuracy</li>
</ul>
<pre># training
print("Start training")
t0 = time()
clf = GaussianNB()
clf.fit(features_train, labels_train)
print("training time:", round(time() - t0, 3), "s")

# prediction
print("start predicting")
t0 = time()
prediction = clf.predict(features_test)
print("predict time:", round(time() - t0, 3), "s")

# accuracy
print("Calculating accuracy")
accuracy = accuracy_score(labels_test, prediction)
print("Accuracy calculated, and the accuracy is", accuracy)</pre>
<p>The output on my machine:</p>
<pre>training time: 1.762 s
start predicting
predict time: 0.286 s
Calculating accuracy
Accuracy calculated, and the accuracy is 0.9732650739476678</pre>
<p>The simple Gaussian Naive Bayes is pretty accurate with 97.3%</p>
<p>The post <a href="https://creatronix.de/lesson-2-naive-bayes/">Lesson 2: Naive Bayes</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Linear Algebra with numpy</title>
		<link>https://creatronix.de/linear-algebra-with-numpy/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Fri, 04 May 2018 11:42:22 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[numpy]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1314</guid>

					<description><![CDATA[<p>Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C. It is often used together with pandas, matplotlib and Jupyter notebooks. Often these packages are referred to as the datascience stack. Installation You can install numpy via pip pip install numpy Basic Usage In the datascience&#8230;</p>
<p>The post <a href="https://creatronix.de/linear-algebra-with-numpy/">Linear Algebra with numpy</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Numpy is a package for scientific computing in Python. It is blazing fast due to its implementation in C.</p>
<p>It is often used together with <a href="https://creatronix.de/introduction-to-pandas/">pandas</a>, <a href="https://creatronix.de/introduction-to-matplotlib/">matplotlib</a> and <a href="https://creatronix.de/introduction-to-jupyter-notebook/">Jupyter</a> notebooks. Often these packages are referred to as the datascience stack.</p>
<h2>Installation</h2>
<p>You can install numpy via pip</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install numpy</code></pre>
</div>
<h2>Basic Usage</h2>
<p>In the datascience world numpy is often imported like this:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>import numpy as np</code></pre>
</div>
<p>The &#8220;as&#8221; keyword defines a so called alias. Now you can use structures from numpy by referencing them with &#8220;np&#8221; instaed of the whole name.</p>
<p>Think &#8220;abbreviation&#8221;.</p>
<h3>n-dimensional array</h3>
<p>The most important data structure is ndarray, which is short for n-dimensional array.</p>
<p>You can convert a list to an numpy array with the array-method</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_list = [1, 2, 3, 4] 
my_array = np.array(my_list)</code></pre>
</div>
<p>You can also convert an array back to a list with</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_new_list = my_array.tolist()</code></pre>
</div>
<p>You can retrieve the dimensionality of an array with the ndim property:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_array.ndim</code></pre>
</div>
<p>and get the number of data points with the shape property</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>my_array.shape</code></pre>
</div>
<h2>Vector arithmetic</h2>
<h3>Addition / Subtraction</h3>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>a = np.array([1, 2, 3, 4]) 
b = np.array([4, 3, 2, 1]) 
a + b 
array([5, 5, 5, 5]) 

a - b 
array([-3, -1, 1, 3])</code></pre>
</div>
<h3>Scalar Multiplication</h3>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>a = np.array([1, 2, 3, 4]) 
a * 3 

array([3, 6, 9, 12])</code></pre>
</div>
<p>To see why it is charming to use numpy&#8217;s array for this operation You have to consider the alternative:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>c = [1,2,3,4] 
d = [x * 3 for x in c]</code></pre>
</div>
<h3>Dot Product</h3>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>a = np.array([1,2,3,4]) 
b = np.array([4,3,2,1]) 
a.dot(b) 

20 # 1*3 + 2*3 + 3*2 + 4*1</code></pre>
</div>
<p>Learn more about numpy:</p>
<p><a href="https://creatronix.de/numpy-random-choice/">numpy random choice</a></p>
<p><a href="https://creatronix.de/numpy-linspace-function/">Numpy linspace function</a></p>
<p><a href="https://github.com/jboegeholz/introduction_to_numpy/blob/master/01_numpy_arrays.ipynb">Project on github</a></p>
<p>The post <a href="https://creatronix.de/linear-algebra-with-numpy/">Linear Algebra with numpy</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Introduction to Jupyter Notebook</title>
		<link>https://creatronix.de/introduction-to-jupyter-notebook/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Wed, 25 Apr 2018 09:17:48 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[jupyter]]></category>
		<category><![CDATA[notebook]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1214</guid>

					<description><![CDATA[<p>JuPyteR Do You know the feeling of being already late to a party when encountering something new? But when you actually start telling others about it, you realize that it is not too common knowledge at all, e.g. Jupyter Notebooks. What is a Jupyter notebook? In my own words: a browser-based document-oriented command line style&#8230;</p>
<p>The post <a href="https://creatronix.de/introduction-to-jupyter-notebook/">Introduction to Jupyter Notebook</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>JuPyteR</h2>
<h2><img decoding="async" class="wp-image-2253 size-full" src="https://creatronix.de/wp-content/uploads/2018/10/img_5986.jpg" width="500" height="627" srcset="https://creatronix.de/wp-content/uploads/2018/10/img_5986.jpg 500w, https://creatronix.de/wp-content/uploads/2018/10/img_5986-239x300.jpg 239w" sizes="(max-width: 500px) 100vw, 500px" /></h2>
<p>Do You know the feeling of being already late to a party when encountering something new?</p>
<p>But when you actually start telling others about it, you realize that it is not too common knowledge at all, e.g. <a href="http://jupyter.org/">Jupyter Notebooks</a>.</p>
<p>What is a Jupyter notebook?</p>
<p>In my own words: a browser-based document-oriented command line style exploration tool for <strong>Ju</strong>lia, <strong>Py</strong>thon and <strong>R</strong>, hence the name JuPyteR Huh!</p>
<p>Ok, let&#8217;s break it down:</p>
<h3>Browser-based</h3>
<p>JuPyter is a client-server concept where you edit your code in a web form in a browser. You send the input of a cell to the server backend for execution and the server sends back a response which will be rendered in your browser.</p>
<h3><img decoding="async" src="https://creatronix.de/wp-content/uploads/2018/04/iris_jupyter-1024x402.png" alt="" class="alignnone size-large wp-image-5952" width="1024" height="402" srcset="https://creatronix.de/wp-content/uploads/2018/04/iris_jupyter-1024x402.png 1024w, https://creatronix.de/wp-content/uploads/2018/04/iris_jupyter-300x118.png 300w, https://creatronix.de/wp-content/uploads/2018/04/iris_jupyter-768x302.png 768w, https://creatronix.de/wp-content/uploads/2018/04/iris_jupyter-1536x603.png 1536w, https://creatronix.de/wp-content/uploads/2018/04/iris_jupyter.png 1858w" sizes="(max-width: 1024px) 100vw, 1024px" /></h3>
<h3>Document-oriented</h3>
<p>On great aspect of a JuPyter is that You can enrich your code in a nice fashion with headlines and markdown code so that you have a document containing code, the result of the code execution and documentation.</p>
<p><a href="http://jupyter.org/"><img decoding="async" src="https://creatronix.de/wp-content/uploads/2018/04/markdown_jupyter-1024x780.png" alt="" class="alignnone size-large wp-image-5965" width="1024" height="780" srcset="https://creatronix.de/wp-content/uploads/2018/04/markdown_jupyter-1024x780.png 1024w, https://creatronix.de/wp-content/uploads/2018/04/markdown_jupyter-300x229.png 300w, https://creatronix.de/wp-content/uploads/2018/04/markdown_jupyter-768x585.png 768w, https://creatronix.de/wp-content/uploads/2018/04/markdown_jupyter.png 1482w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<h2>Installation and Run</h2>
<p>If You already have a python installation You can either use pip or pipenv to install JuPyter</p>
<h3>Pip</h3>
<pre>pip install jupyter</pre>
<h3>Pipenv</h3>
<pre>pipenv install jupyter</pre>
<p>After installation you can start it on the console with:</p>
<pre>jupyter notebook</pre>
<p>An alternative way is to use the <a href="https://www.anaconda.com/download/">anaconda distribution</a>.</p>
<h2>Disadvantages</h2>
<p>On big drawback -when your background is SW development- is that You don&#8217;t have code completion.<strong></strong></p>
<p>Another disadvantage: modularization of your code is not easy.</p>
<p>Versioning is an issue as well. Because the Jupyter notebook&#8217;s json files contains code and generated artifacts like plots every re-run of a notebook changes the file. The diff is not easily comprehensible.</p>
<h2>PyCharm Integration</h2>
<p>For the code completion issue there is JetBrains for the rescue: PyCharm IDE has an integrated JuPyter editor which supports code completion.</p>
<h2><img decoding="async" src="https://creatronix.de/wp-content/uploads/2018/04/pycharm_jupyter-1024x351.png" alt="" class="alignnone size-large wp-image-5967" width="1024" height="351" srcset="https://creatronix.de/wp-content/uploads/2018/04/pycharm_jupyter-1024x351.png 1024w, https://creatronix.de/wp-content/uploads/2018/04/pycharm_jupyter-300x103.png 300w, https://creatronix.de/wp-content/uploads/2018/04/pycharm_jupyter-768x263.png 768w, https://creatronix.de/wp-content/uploads/2018/04/pycharm_jupyter-1536x526.png 1536w, https://creatronix.de/wp-content/uploads/2018/04/pycharm_jupyter-2048x701.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></h2>
<h2>Useful Keyboard Shortcuts</h2>
<p>You can open command palette via</p>
<p>Cmd + Shift + P on Mac OS or via</p>
<p>Ctrl + Shift + P on Linux and Windows</p>
<p>Ctrl + Enter Run Cell</p>
<p>Alt + Enter Run Cell and insert new cell below</p>
<p><a href="https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/">https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/</a></p>
<p>The post <a href="https://creatronix.de/introduction-to-jupyter-notebook/">Introduction to Jupyter Notebook</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Data Science Datasets: Iris flower data set</title>
		<link>https://creatronix.de/data-science-datasets-iris-flower-data-set/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Wed, 25 Apr 2018 08:55:12 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[iris flower data set]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[sklearn]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1373</guid>

					<description><![CDATA[<p>Motivation When you are going to learn some data science the aquisition of data is often the first step. To get you started scikit-learn comes with a bunch of so called &#8220;toy datasets&#8221;. One of them is the Iris dataset. Prerequisites &#38; Imports Besides scikit-learn we will use pandas for data handling and matplotlib with&#8230;</p>
<p>The post <a href="https://creatronix.de/data-science-datasets-iris-flower-data-set/">Data Science Datasets: Iris flower data set</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Motivation</h2>
<p>When you are going to learn some data science the aquisition of data is often the first step.</p>
<p>To get you started scikit-learn comes with a bunch of so called &#8220;toy datasets&#8221;. One of them is the Iris dataset.</p>
<h2>Prerequisites &amp; Imports</h2>
<p>Besides scikit-learn we will use <a href="https://creatronix.de/introduction-to-pandas/">pandas</a> for data handling and <a href="https://creatronix.de/introduction-to-matplotlib/">matplotlib</a> with seaborn for visualization. So let&#8217;s install them:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-bash" data-lang="Bash"><code>pip install scikit-learn pandas seaborn matplotlib</code></pre>
</div>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>from sklearn import datasets
import seaborn as sns
import pandas as pd
sns.set_palette('husl')
import matplotlib.pyplot as plt
%matplotlib inline</code></pre>
</div>
<h2>Iris data set</h2>
<p>The Iris flower data set or Fisher&#8217;s Iris data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines.</p>
<p>It is sometimes called Anderson&#8217;s <i>Iris</i> data set because Edgar Anderson collected the data to quantify the morphological variation of <i>Iris</i> flowers of three related species.</p>
<p>This data set can be imported from scikit-learn like the following:</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>iris = datasets.load_iris() 
</code></pre>
</div>
<div>
<h2>Convert to Pandas Dataframe</h2>
</div>
<p>To work with the dataset we convert it into a pandas dataframe.</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>df = pd.DataFrame(
    iris['data'],
    columns=iris['feature_names']
)
df['species'] = iris['target']
df['species'] = df['species'].map({
    0 : 'Iris-setosa',
    1 : 'Iris-versicolor',
    2 : 'Iris-virginica'
})</code></pre>
</div>
<div>
<h2>Data visualization</h2>
<p>Seaborn has a nice way to visualize data for exploration with the pariplot function.</p>
<p>It takes every feature and compares it pairwise with every other feature</p>
<div class="hcb_wrap">
<pre class="prism line-numbers lang-python" data-lang="Python"><code>g = sns.pairplot(df, hue='species', markers='+')
plt.show()</code></pre>
</div>
</div>
<h2><img decoding="async" src="https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-1024x888.png" alt="" class="alignnone size-large wp-image-5972" width="1024" height="888" srcset="https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-1024x888.png 1024w, https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-300x260.png 300w, https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot-768x666.png 768w, https://creatronix.de/wp-content/uploads/2018/04/iris_sns_pairplot.png 1137w" sizes="(max-width: 1024px) 100vw, 1024px" /></h2>
<h2>Further Reading</h2>
<p><a href="https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset">https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset</a></p>
<p><a href="https://www.kaggle.com/code/jchen2186/machine-learning-with-iris-dataset">https://www.kaggle.com/code/jchen2186/machine-learning-with-iris-dataset</a></p>
<p><a href="https://creatronix.de/introduction-to-jupyter-notebook/">Introduction to Jupyter Notebook</a></p>
<p><a href="https://creatronix.de/introduction-to-pandas/">Introduction to Pandas</a></p>
<p><a href="https://creatronix.de/pandas-cheat-sheet/">Pandas Cheat Sheet</a></p>
<p><a href="https://creatronix.de/introduction-to-matplotlib/">Introduction to matplotlib</a></p>
<p>The post <a href="https://creatronix.de/data-science-datasets-iris-flower-data-set/">Data Science Datasets: Iris flower data set</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Data Science Overview</title>
		<link>https://creatronix.de/data-science-overview/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Wed, 07 Mar 2018 09:40:28 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<category><![CDATA[Unsupervised Learning]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1150</guid>

					<description><![CDATA[<p>Questions Data Science tries to answer one of the following questions: Classification -&#62; &#8220;Is it A or B?&#8221; Clustering -&#62; &#8220;Are there groups which belong together?&#8221; Regression -&#62; &#8220;How will it develop in the future?&#8221; Association -&#62; &#8220;What is happening very often together?&#8221; There are two ways to tackle these problem domains with machine learning:&#8230;</p>
<p>The post <a href="https://creatronix.de/data-science-overview/">Data Science Overview</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Questions</h2>
<p>Data Science tries to answer one of the following questions:</p>
<ul>
<li>Classification -&gt; &#8220;Is it A or B?&#8221;</li>
<li>Clustering -&gt; &#8220;Are there groups which belong together?&#8221;</li>
<li>Regression -&gt; &#8220;How will it develop in the future?&#8221;</li>
<li>Association -&gt; &#8220;What is happening very often together?&#8221;</li>
</ul>
<p>There are two ways to tackle these problem domains with machine learning:</p>
<ol>
<li>Supervised Learning</li>
<li>Unsupervised Learning</li>
</ol>
<h2>Supervised Learning</h2>
<p>You have training and test data with <strong>labels</strong>. Labels tell You to which e.g. class a certain data item belongs. Image you have images of pets and the labels are the name of the pets.</p>
<h2><em>Unsupervised Learning</em></h2>
<p>Your data doesn&#8217;t have labels. Your algorithm e.g. k-means clustering need to figure out a structure given only the data</p>
<p>&nbsp;</p>
<p><iframe title="[S1E2] Back to The Future | 5 Minutes With Ingo" width="1200" height="675" src="https://www.youtube.com/embed/zDxh1dEt_Mo?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></p>
<p>The post <a href="https://creatronix.de/data-science-overview/">Data Science Overview</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>My personal roadmap for learning data science in 2018</title>
		<link>https://creatronix.de/my-personal-road-map-for-learning-data-science/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Wed, 13 Dec 2017 14:05:14 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Self-Improvement & Personal Finance]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[new year's resolution]]></category>
		<category><![CDATA[numpy]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[road map]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=1177</guid>

					<description><![CDATA[<p>I got confused by all the buzzwords: data science, machine learning, deep learning, neural nets, artificial intelligence, big data, and so on and so on. As an engineer I like to put some structure to the chaos. Inspired by Roadmap: How to Learn Machine Learning in 6 Months and Tetiana Ivanova &#8211; How to become&#8230;</p>
<p>The post <a href="https://creatronix.de/my-personal-road-map-for-learning-data-science/">My personal roadmap for learning data science in 2018</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>I got confused by all the buzzwords: data science, machine learning, deep learning, neural nets, artificial intelligence, big data, and so on and so on.</p>
<p><img decoding="async" class="alignnone size-full wp-image-1253" src="https://creatronix.de/wp-content/uploads/2017/12/normal_distribution_3.png" alt="" width="487" height="469" srcset="https://creatronix.de/wp-content/uploads/2017/12/normal_distribution_3.png 487w, https://creatronix.de/wp-content/uploads/2017/12/normal_distribution_3-300x289.png 300w" sizes="(max-width: 487px) 100vw, 487px" /></p>
<p>As an engineer I like to put some structure to the chaos. Inspired by <a href="https://youtu.be/MOdlp1d0PNA"><span id="eow-title" class="watch-title" dir="ltr" title="Roadmap: How to Learn Machine Learning in 6 Months">Roadmap: How to Learn Machine Learning in 6 Months </span></a>and <a href="https://youtu.be/rIofV14c0tc"><span id="eow-title" class="watch-title" dir="ltr" title="Tetiana Ivanova - How to become a Data Scientist in 6 months a hacker’s approach to career planning">Tetiana Ivanova &#8211; How to become a Data Scientist in 6 months a hacker’s approach to career planning </span></a> I build my own learning road map for this year:<br />
So 2018 will be all about Data Science. Hearing about the <a href="http://jarche.com/pkm/">Personal Knowledge Mastery</a> concept at SWEC17 I am going to tackle the learning process on different levels.</p>
<h2>Watch the Pros</h2>
<p>Thanks to open course ware there are a ton of awesome university courses online e.g.:</p>
<p><a href="https://youtu.be/C1lhuz6pZC0">MIT 6.0002 Introduction to Computational Thinking and Data Science</a></p>
<h2>Learn the tools</h2>
<p>There is already a whole bunch of tools we can consider belonging to a standard data science stack. Because my main language is Python the focus is of course on mostly python modules.</p>
<ul>
<li><a href="https://creatronix.de/introduction-to-jupyter-notebook/">JuPyter Notebook</a></li>
<li><a href="https://creatronix.de/linear-algebra-with-numpy-part-1/">numpy</a></li>
<li>pandas</li>
<li><a href="https://seaborn.pydata.org/">seaborn</a></li>
<li><a href="https://bokeh.pydata.org/en/latest/">bokeh</a></li>
<li><a href="http://holoviews.org/">holoviews</a></li>
<li><a href="http://scikit-learn.org/stable/">scikit-learn</a></li>
<li><a href="https://keras.io/">keras</a> / <a href="https://www.tensorflow.org/">TensorFlow</a></li>
<li>Tableau</li>
</ul>
<h2>Finishing Udacity / Udemy courses</h2>
<p>To brush up my python skills and my knowledge of basic computer science I will finish some already started online courses:</p>
<ul>
<li style="list-style-type: none;">
<ul>
<li>[  ] <a href="https://creatronix.de/ud120-intro-to-machine-learning/">Introduction to Machine Learning</a></li>
<li>[  ] Python Bootcamp</li>
<li>[  ] Algorithms and Data Structures</li>
<li>[  ] Introduction to Artificial Intelligence</li>
<li>[  ] <a href="https://classroom.udacity.com/courses/ud810/">Introduction to computer vision</a></li>
<li>[  ] <a href="https://classroom.udacity.com/courses/cs373">Artificial Intelligence for Robotics</a></li>
</ul>
</li>
</ul>
<h2>Reading data science books</h2>
<p>To get a broad overview I bought two books on DS / ML</p>
<ul>
<li>[  ] Data Science from Scratch</li>
<li>[  ] Hands on Machine Learning</li>
</ul>
<h2>Do Exercises on Kaggle</h2>
<ul>
<li>[x] Create Account at Kaggle</li>
<li>[  ] Do first exercise</li>
<li>[  ] Participate in a contest</li>
</ul>
<h2>Visit Meetups about Data Science</h2>
<p>[  ] Visit <a href="https://www.meetup.com/de-DE/Nuernberg-Big-Data/?_af_cid=Nuernberg-Big-Data">Big Data Meetup Events</a></p>
<h2>Add some Peer Pressure</h2>
<p>My brother in law and I teemed up and build a Whatsapp learn &amp; exchange group. We are currently four members.</p>
<h2>Write Blog Articles</h2>
<p>I will try to incorporate some of the stuff I&#8217;ve learned into blog articles.</p>
<p>I already did</p>
<ul>
<li><a href="https://creatronix.de/bayes-theorem-part-1/">Bayes’ Theorem Part 1</a></li>
<li><a href="https://creatronix.de/data-science-overview/">Data Science Overview</a></li>
<li><a href="https://creatronix.de/classification-precision-and-recall/">Classification: Precision and Recall</a></li>
<li><a href="https://creatronix.de/confusion-matrix/">Confusion Matrix</a></li>
<li><a href="https://creatronix.de/ud120-intro-to-machine-learning/">UD120 Intro to Machine Learning</a></li>
<li><a href="https://creatronix.de/lesson-2-naive-bayes/">Lesson 2: Naive Bayes</a></li>
<li><a href="https://creatronix.de/lesson3-support-vector-machines/">Lesson 3: Support Vector Machines</a></li>
</ul>
<p>So stay tuned!</p>
<p>The post <a href="https://creatronix.de/my-personal-road-map-for-learning-data-science/">My personal roadmap for learning data science in 2018</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Bayes’ Theorem</title>
		<link>https://creatronix.de/bayes-theorem/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Sun, 03 Dec 2017 16:26:26 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[base rate fallacy]]></category>
		<category><![CDATA[Bayes theorem]]></category>
		<category><![CDATA[data science]]></category>
		<guid isPermaLink="false">https://creatronix.de/?p=1171</guid>

					<description><![CDATA[<p>Imagine that you come home from a party and you are stopped by the police. They ask you to take a drug test and you accept. The test result is positive. You are guilty. But wait a minute! Is it really that simple? In Germany about 2.8 million people consume weed on a regular basis,&#8230;</p>
<p>The post <a href="https://creatronix.de/bayes-theorem/">Bayes’ Theorem</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Imagine that you come home from a party and you are stopped by the police. They ask you to take a drug test and you accept. The test result is positive. You are guilty.</p>
<p>But wait a minute! Is it really that simple?</p>
<p>In Germany about 2.8 million people consume weed on a regular basis, that&#8217;s about 3.5% of the population.</p>
<p>Let&#8217;s say D is drug addict and consumes weed regularly, ¬D consumes no weed. So the chance by randomly picking a person that he or she is a drug addict is P(D) = 0.035</p>
<p>Because You either take drugs or you don&#8217;t the remaining part must be non-drug takers P(¬D) = 1 &#8211; P(D) = 0.965.</p>
<p>The accuracy of a drug test is about 92%. So let&#8217;s assume that there are 8% false positives and 8% false negatives as well.</p>
<p>The chance that if a person actually takes drugs the test result will be positive is P(+|D) = 0.92 but You also get a positive reuslt when a person doesn&#8217;t take drugs in 8% of all cases: P(+|¬D) = 0.08 These are called &#8220;False positives&#8221;.</p>
<p>When a person doesn&#8217;t take drugs the test will be negative in 92% of all cases P(-|¬D) = 0.92.  And of course a test can also be negative even if a person takes drugs P(-|D) = 0.08. these are called &#8220;False negatives&#8221;. Got it?</p>
<p>What comes next?</p>
<p>Combined Probabilities</p>
<p>Knowing the success and error rates of the test and the relative distribution of drug consumers we can calculate the combined probabilities:</p>
<ul>
<li>P(+, D) = 0.035 * 0.92 = 0.0322</li>
<li>
<ul>
<li>Think: The test is <strong>positive AND</strong> the person is a drug user</li>
</ul>
</li>
<li>P(+, ¬D) = 0.965 * 0.08 = 0.0772</li>
<li>
<ul>
<li>Think: The test is <strong>positive AND</strong> the person is <strong>NOT</strong> a drug user</li>
</ul>
</li>
<li>P(-, D) = 0.035 * 0.08 = 0.0028</li>
<li>
<ul>
<li>Think: The test is <strong>negative AND</strong> the person is a drug user</li>
</ul>
</li>
<li>P(-, ¬D) = 0.965 * 0.92 = 0,8878</li>
<li>
<ul>
<li>Think: The test is <strong>negative AND</strong> the person is <strong>NOT</strong> a drug user</li>
</ul>
</li>
</ul>
<h2>Bayes Theorem</h2>
<p>P(A|B) = P(B|A) * P(A) / P(B)</p>
<p>In our case we are interested in the probability of a person being a drug addict given the test is positive. That means:</p>
<p>P(D | +) = P(+ | D) * P(D) / P(+) = P(+ | D) * P(D) / ( P(+, D) + P(+, ¬D) )</p>
<p>= 0.92 * 0.035 / (0.0322 + 0.0772) = <strong>0.294</strong></p>
<p>The outcome is quite interesting and mildly shocking: The probability that a person tested positively is actually a drug addict is only around 29% or less than one third!</p>
<p>Why is this so counter intuitive, when the test states an accuracy of 92%?  That is the so called <strong>base rate fallacy</strong>. We have to take into account that only 3.5% of the population actually take drugs.</p>
<p>More read about drug tests</p>
<p><a href="https://www.webmd.com/drug-medication/news/20100528/drug-tests-often-trigger-false-positives#1">Drug tests generally produce false-positive results in 5% to 10% of cases and false negatives in 10% to 15% of cases, new research shows.</a></p>
<p>The post <a href="https://creatronix.de/bayes-theorem/">Bayes’ Theorem</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
