Lesson 10: Feature Scaling

What is Feature Scaling?

Feature Scaling is an important pre-processing step for some machine learning algorithms.

Imagine you have three friends of whom you know the individual weight and height.

You would like to deduce Chris’ T-shirt size from Cameron’s and Sarah’s by looking at the height and weight.

Name Height in m Weight in kg T-Shirt size
Sarah 1.58 52 Small
Cameron 1.79 79 Large
Chris 1.86 64 ?

One way You could determine the shirt size is to just add up the weight and the height of each friend. You would get:

Name Height + weight T-Shirt size
Sarah 53.58 Small
Cameron 80.79 Large
Chris 65.86

Because Chris height + weight number is nearer to Sarah’s number than to Cameron’s, Chris should wear a small T-Shirt. What?

Feature Scaling Formula

x’ = (x – xmin) / (xmax – xmin)

 

Feature min max
Height 1.58 1.86
Weight 52 79
Name Scaled Height Scaled Weight T-Shirt size
Sarah 0 0 Small
Cameron 0.75 1 Large
Chris 1 0.44 ?
def featureScaling(arr):
    ret_arr = []
    min_val = min(arr)
    max_val = max(arr)
    if min_val == max_val:
        raise ZeroDivisionError()
    for f in arr:
        f = (f - min_val) / float((max_val - min_val))
        ret_arr.append(f)
    return ret_arr

MinMaxScaler from sklearn

Instead of writing our own feature scaler we can we should use the MinMaxScaler from sklearn:

from sklearn.preprocessing import MinMaxScaler
import numpy as np

weights = np.array([[52.0], [79.0], [64.0]])
scaler = MinMaxScaler()
rescaled_weight = scaler.fit_transform(weights)
print(rescaled_weight)

Affected Algorithms

Which algorithms are affected by non-properly scaled features?

SVM and k-means