Linear Regression with sklearn – cheat sheet

# import and instantiate model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

#prepare test data
features_train = df_train.loc[:, 'feature_name']
target_train = df_train.loc[:, 'target_name']

#fit (train) model and print coefficient and intercept
model.fit(features_train , target_train )
print(model.coef_)
print(model.intercept_)

# calculate model quality
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

target_prediction = model.predict(features_train)
print(mean_squared_error(target_train , target_prediction))
print(r2_score(target_train , target_prediction))

# test predictions
features_test = df_train.loc[:, 'feature_name'] 
target_test = df_train.loc[:, 'target_name']
target_prediction_test = model.predict(features_test) 
print(mean_squared_error(target_test, target_prediction_test )) 
print(r2_score(target_test, target_prediction_test ))

Data Science Datasets: Iris flower data set

The Iris flower data set or Fisher’s Iris data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines.

It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphological variation of Iris flowers of three related species.

This data set can be imported from scikit-learn like the following:

from sklearn import datasets

iris = datasets.load_iris()
iris.data.shape, iris.target.shape