• Save
  • Run All Cells
  • Clear All Output
  • Runtime
  • Download
  • Difficulty Rating

Loading Runtime

Materials from this workshop:

Train a Machine Learning Model using Scikit-Learn

Below is code that will load the Penguins dataset and immediately split 30% of the observations into a test dataframe, and the remaining 70% of the observations into a train dataframe.

Your task is to use these two datasets to train a Decision Tree algorithm using Scikit-Learn step-by-step.

Divide the train dataset into X_train and y_train

  • Select the species column from the train dataframe and save the result to the variable y_train.
  • Select the remaining columns from the train dataframe and save the result to the variable X_train.

Please note the lowercase y and capital X in y_train and X_train, respectively.

Reset Code

Divide the test dataset into X_test and y_test

  • Select the species column from the test dataframe and save the result to the variable y_test.
  • Select the remaining columns from the test dataframe and save the result to the variable X_test.

Please note the lowercase y and capital X in y_test and X_test, respectively.

Reset Code

Train (fit) a DecisionTreeClassifier model

Refer back to the code that we wrote together during the workshop to do the following:

  • Import the DecisionTreeClassifier from sklearn.tree
  • Store a DecisionTreeClassifier a variable called model.
  • Use the model's .fit() method to train the algorithm. You'll need to pass the algorithm your training data X_train and y_train -in that specific order.

This three step process will be nearly identical for any Scikit-Learn algorithm that we choose to use.

Reset Code

Check the model's accuracy with using the X_test and y_test data.

  • You can do this manually using the .predict() method and hten calculating the percentage of correct predictions - or -
  • You can use the model's .score() method (this is much easier).

Please save the model's accuracy on the test data to a variable called dt_accuracy.

Reset Code

STRETCH GOAL CHALLENGE!

Stretch Goal Challenges are "extra credit" tasks that weren't necessarily taught you during our live workshop but that ask you to use your own brain/research to go a little bit above and beyond all on your own.

Can you...

Train a different machine learning algorithm from Scikit-Learn on the same data? Try using a RandomForestClassifier

You'll follow exactly the same steps that we did for the Decision Tree Classifier.

Save your RandomForestClassifier's final accuracy to a variable called rf_accuracy.

Reset Code