• Save
  • Run All Cells
  • Clear All Output
  • Runtime
  • Download
  • Difficulty Rating

Loading Runtime

In machine learning, a test set is a separate dataset used to assess the performance and generalization ability of a trained machine learning model. The test set is distinct from the training set, which is the data used to train the model. The primary purpose of the test set is to evaluate how well the model can make predictions on new, previously unseen data.

There are typically multiple datasets employed in training and evaluating machine learning models. At a minimum the following three datasets are required:

  • Training Set: This is the portion of the dataset used to train the machine learning model. The model learns patterns and relationships in the training data by adjusting its parameters (weights and biases) based on a specified objective, usually minimizing a loss function.
  • Validation Set (Best Practice): In addition to the training set, a separate validation set may be used during the training process to fine-tune hyperparameters and monitor the model's performance without overfitting to the training data.
  • Test Set: Once the model has been trained and potentially validated, it is evaluated on a completely independent dataset called the test set. The test set contains examples that the model has never seen during training or validation. Evaluating the model on this unseen data provides a more realistic assessment of its ability to generalize to new, unseen instances.

The process of splitting a dataset into training, validation, and test sets is crucial to ensure that the model's performance metrics accurately reflect its generalization capabilities. A common practice is to use a majority of the data for training, a smaller portion for validation, and a separate portion for testing. The exact split ratio depends on the size of the dataset and specific requirements.

By evaluating a model on a test set, machine learning practitioners can make informed decisions about the model's performance and its potential to make accurate predictions on new, real-world data.