name = "joern" age = "37" print(f"My name is {name} and I'm {age}")]]>

After trying a couple of times to get into a Management 3.0 training I finally ha the chance to participate in a two day course in Nuremberg.

We training was hosted at Ancud IT in downtown Nuremberg. The quiet working environment was awesome, the catering for the coffee breaks incredibly good. Kudo cards to the Ancud team, but more about Kudo Cards later.

Jürgen Mohr works as an independent agile coach / scrum master and Management 3.0 trainer. He used the M3.0 approach in a couple of projects and could tell a lot of stories from his work experience.

So what is Management 3.0?

The overall question is: “What does a manager still do in a complex system with self-organizing teams?”

The Mindset of Management 3.0 tries to answer this question with “Manage the system not the people”

An organization is a complex adaptive system

The inventor of M3.0 Jurgen Appello built his concept around six views of agile management

- energize people
- empower teams
- align constraints
- develop competence
- grow structure
- improve everything

Saskia, Carsten and I had to form a team and find a name which represents our commonalities. We came up with #YORO – You only retire once, because we all considered working less at the age of fifty 🙂

We had to choose the team values from the values list:

We came up with

- Openess
- Agility
- Trust
- Innovation

We tried out a lot of different tools. Jürgen set up a Kudo wall onto which we could put the Kudo cards to praise the behavior of our colleagues.

- Personal Maps
- Kudo Cards / Wall
- Moving Motivators
- Delegation Poker
- Happiness Door
- Meddler’s Game
- Celebration Grid

There are two must reads: Management 3.0 and Managing for happiness. While the first one is the scientific the other is the “playbook”, it comprises all the tools you can directly use in practice.

Additionally You can read Daniel Pink “Drive”

- First Kudo Card given to a colleague and our HR team
- Personal Maps as part of a team building workshop
- Moving Motivators in an hiring interview

]]>

Let me bring some clarity into this topic!

Let’s take the example from Precision and Recall:

y_true = ["dog", "dog", "non-dog", "non-dog", "dog", "dog"] y_pred = ["dog", "non-dog", "dog", "non-dog", "dog", "non-dog"]

When we look at the prediction we can count the correct and incorrect classifications:

- dog correctly classified as dog: 2 times (True Positive)
- non-dog incorrectly classified as dog: 1 time (False Positive)
- dog incorrectly classified as non-dog: 2 times (False Negative)
- non-dog correctly classified as non-dog: 1 time (True Negative)

When we visualize these results in a matrix we already have the confusion matrix:

We can calculate the confusion matrix with sklearn in a very simple manner

from sklearn.metrics import confusion_matrix print(confusion_matrix(y_true, y_pred, labels=["dog", "non-dog"]))

the output is:

[[2 2] [1 1]]

which can be indeed confusing because the matrix is transposed. In contrast to our matrix from above the columns are the prediction and the rows are the actual values:

And that’s all – if you just have a binary classifier.

So what happens, when your classifier can decide between three outcomes, say dog, cat and rabbit? (You can generate the test data with numpy random choice)

y_true = ['rabbit', 'dog', 'rabbit', 'cat', 'cat', 'cat', 'cat', 'dog', 'cat'] y_pred = ['rabbit', 'rabbit', 'dog', 'cat', 'dog', 'rabbit', 'dog', 'cat', 'dog'] cm = confusion_matrix(y_true, y_pred, labels=["dog", "rabbit", "cat"])

[[0 1 1] [1 1 0] [3 1 1]]]]>

numpy.random.randint(low, high=None, size=None, dtype='l') numpy.random.random_integers(low, high=None, size=None)

random_integers includes the high boundary while randint does not.

>>> import numpy as np >>> np.random.random_integers(5) 4 >>> np.random.random_integers(5, size=(5)) array([5, 3, 4, 1, 4]) >>>np.random.random_integers(5, size=(5, 4)) array([[2, 3, 3, 5], [1, 3, 1, 3], [5, 3, 3, 4], [1, 5, 2, 5], [2, 5, 4, 5]])

If You want a random selection of choices from an array you can use

>>> import numpy as np >>> animals = ['dog', 'cat', 'rabbit'] >>> np.random.choice(animals, 9)) array(['dog', 'rabbit', 'dog', 'rabbit', 'dog', 'dog', 'cat', 'dog', 'cat'], dtype='|S6')]]>

Living together with little kids You very often run into classification issues:

My daughter really likes dogs, so seeing a dog is something positive. When she sees a normal dog e.g. a Labrador and proclaims: “Look, there is a dog!”

That’s a **True Positive (TP)**

If she now sees a fat cat and proclaims: “Look at the dog!” we call it a **False Positive (FP)**, because her assumption of a positive outcome (a dog!) was false.

If I point at a small dog e.g. a Chihuahua and say “Look at the dog!” and she cries: “This is not a dog!” but indeed it is one, we call that a **False negatives (FN)**

And last but not least, if I show her a bird and we agree on the bird not being a dog we have a **True Negative (TN)**

This neat little matrix shows all of them in context:

If I show my daughter twenty pictures of cats and dogs (8 cat pictures and 12 dog pictures) and she identifies 10 as dogs but out of ten dogs there are actually 2 cats her precision is 8 / (8+2) = 4/5 or 80%.

**Precision = TP / (TP + FP)**

Knowing that there are actually 12 dog pictures and she misses 4 (false negatives) her recall is 8 / (8+4) = 2/3 or roughly 67%

**Recall = TP / (TP + FN)**

Which measure is more important?

It depends:

If you’re a dog lover it is better to have a high precision, when you are afraid of dogs say to avoid dogs, a higher recall is better 🙂

Precision is also called **Positive Predictive Value (PPV)**

Recall often is also called

- True positive rate
- Sensitivity
- Probability of detection

**ACC = (TP + TN) / (TP + FP + TN + FN)**

You can combine Precision and Recall to a measure called F1-Score. It is the harmonic mean of precision and recall

**F1 = 2 / (1/Precision + 1/Recall)**

scikit-learn being a one-stop-shop for data scientists does of course offer functions for calculating precision and recall:

from sklearn.metrics import precision_score y_true = ["dog", "dog", "not-a-dog", "not-a-dog", "dog", "dog"] y_pred = ["dog", "not-a-dog", "dog", "not-a-dog", "dog", "not-a-dog"] print(precision_score(y_true, y_predicted , pos_label="dog"))

Let’s assume we trained a binary classifier which can tell us “dog” or “not-a-dog”

In this example the precision is 0.666 or ~67% because in two third of the cases the algorithm was right when it predicted a dog

from sklearn.metrics import recall_score print(recall_score(y_true, y_pred, pos_label="dog"))

The recall was just 0.5 or 50% because out of 4 dogs it just identified 2 correctly as dogs.

from sklearn.metrics import accuracy_score print(accuracy_score(y_true, y_pred))

The accuracy was also just 50% because out of 6 items it made only 3 correct predictions.

from sklearn.metrics import f1_score print(f1_score(y_true, y_pred, pos_label="dog"))

The F1 score is 0.57 – just between 0.5 and 0.666.

What other scores do you encounter? – stay tuned for the next episode 🙂

]]>

the host of this course are Sebastian Thrun, ex-google-X and founder of Udacity and Katie Malone, creator of the Linear digressions podcast.

The course consists of 17 lessons. Every lesson has a couple of hours of video and lots and lots of quizzes in it.

- [x] Lesson 1: Only introduction 🙂
- [x] Lesson 2: Naive Bayes
- [x] Lesson 3: Support Vector Machines
- [x] Lesson 4: Decision Trees
- [x] Lesson 5: Choose your own algorithm
- [ ] Lesson 6: Datasets and questions
- [ ] Lesson 7: Regression
- Lesson 8: Outliers
- Lesson 9: Clustering
- Lesson 10: Feature Scaling
- Lesson 11: Text Learning
- Lesson 12: Feature Selection
- Lesson 13: PCA
- Lesson 14: Validation
- Lesson 15: Evaluation Metrics
- Lesson 16: Tying it all together
- Lesson 17: Final project

For the mini project you should fork https://github.com/udacity/ud120-projects and clone it. It is recommended to install a python 2.7 64bit version because ML is heavy data processing and can easily rip up more than 2GB of memory.

After cloning the repo I would recommend setting up a venv and install the requirements:

- sklearn
- numpy
- scipy
- matplotlib

the code itself is pretty straightforward:

- Instantiate the classifier
- Train (fit) the Classifier
- Predict
- Calculate accuracy

# training print("Start training") t0 = time() clf = GaussianNB() clf.fit(features_train, labels_train) print("training time:", round(time() - t0, 3), "s") # prediction print("start predicting") t0 = time() prediction = clf.predict(features_test) print("predict time:", round(time() - t0, 3), "s") # accuracy print("Calculating accuracy") accuracy = accuracy_score(labels_test, prediction) print("Accuracy calculated, and the accuracy is", accuracy)

The output on my machine:

training time: 1.762 s start predicting predict time: 0.286 s Calculating accuracy Accuracy calculated, and the accuracy is 0.9732650739476678

The simple Gaussian Naive Bayes is pretty accurate with 97.3%

]]>print("Start training") t0 = time() clf = svm.SVC(kernel="linear") clf.fit(features_train, labels_train) print("training time:", round(time() - t0, 3), "s") print("start predicting") t0 = time() prediction = clf.predict(features_test) print("predict time:", round(time() - t0, 3), "s") # accuracy print("Calculating accuracy") accuracy = accuracy_score(labels_test, prediction) print("Accuracy calculated, and the accuracy is", accuracy)

When timing the training of the SVC, it’s astonishing how long it takes: around 2.5 minutes at 98.4% accuracy.

As an alternative You can use:

clf = LinearSVC(loss='hinge')

It gets you a result in 0.3 seconds with the same accuracy.

What’s the difference?

Parameter tuning

with the initial SVC we can play around with the parameters “C” and “kernel”

Kernels

]]>

Declaration

a = np.array([1,2,3,4]) [1 2 3 4]

Addition / Subtraction

a = np.array([1,2,3,4]) b = np.array([4,3,2,1]) a + b array([5, 5, 5, 5]) a - b array([-3, -1, 1, 3])

Scalar Multiplication

a = np.array([1,2,3,4]) a * 3 array([ 3, 6, 9, 12])

To see why it is charming to use numpy’s array for this operation You have to consider the alternative:

c = [1,2,3,4] d = [x * 3 for x in c]

Dot Product

a = np.array([1,2,3,4]) b = np.array([4,3,2,1]) a.dot(b) 20 # 1*3 + 2*3 + 3*2 + 4*1

Stay tuned for more algebraic stuff with numpy!

]]>