What is Feature Scaling?
Feature Scaling is an important pre-processing step for some machine learning algorithms.
Imagine you have three friends of whom you know the individual weight and height.
You would like to deduce Chris’ T-shirt size from Cameron’s and Sarah’s by looking at the height and weight.
|Name||Height in m||Weight in kg||T-Shirt size|
One way You could determine the shirt size is to just add up the weight and the height of each friend. You would get:
|Name||Height + weight||T-Shirt size|
Because Chris height + weight number is nearer to Sarah’s number than to Cameron’s, Chris should wear a small T-Shirt. What?
Feature Scaling Formula
x’ = (x – xmin) / (xmax – xmin)
|Name||Scaled Height||Scaled Weight||T-Shirt size|
def featureScaling(arr): ret_arr =  min_val = min(arr) max_val = max(arr) if min_val == max_val: raise ZeroDivisionError() for f in arr: f = (f - min_val) / float((max_val - min_val)) ret_arr.append(f) return ret_arr
MinMaxScaler from sklearn
Instead of writing our own feature scaler
we can we should use the MinMaxScaler from sklearn:
from sklearn.preprocessing import MinMaxScaler import numpy as np weights = np.array([[52.0], [79.0], [64.0]]) scaler = MinMaxScaler() rescaled_weight = scaler.fit_transform(weights) print(rescaled_weight)
Which algorithms are affected by non-properly scaled features?
SVM and k-means