Data Science: Cross-Validation

For validating your model You need to split your data into a training and a test data set.

More training data means a better model, more test data means better validation.

But because the amount of data to train/test the model is limited you have to decide in which ratio of training vs test data you want to split your data.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm

iris = datasets.load_iris()
iris.data.shape, iris.target.shape

Sample a training set while holding out 40% of the data for testing:

X_train, X_test, y_train, y_test = train_test_split( iris.data, iris.target, test_size=0.4, random_state=0)
from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)

Five Minutes with Ingo: Cross Validation

 

Leave a Reply

Your email address will not be published. Required fields are marked *