For validating your model You need to split your data into a training and a test data set.
More training data means a better model, more test data means better validation.
But because the amount of data to train/test the model is limited you have to decide in which ratio of training vs test data you want to split your data.
import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets from sklearn import svm iris = datasets.load_iris() iris.data.shape, iris.target.shape
Sample a training set while holding out 40% of the data for testing:
X_train, X_test, y_train, y_test = train_test_split( iris.data, iris.target, test_size=0.4, random_state=0)
from sklearn.model_selection import cross_val_score clf = svm.SVC(kernel='linear', C=1) scores = cross_val_score(clf, iris.data, iris.target, cv=5)
Five Minutes with Ingo: Cross Validation