Browse Livestreams
- Regularization Techniques for Linear Regression
- Linear Regression Modeling
- Intro to Gradient Descent
- Intro to Linear Regression
- Covariance and Correlation (Bivariate EDA)
- Data Visualizations for EDA (univariate)
- Intro to Exploratory Data Analysis (EDA)
- Math for Data Science
- Classifying Penguins with Decision Trees
- Supervised Learning - Classification vs Regression
- What is Machine Learning?
- Save
- Run All Cells
- Clear All Output
- Runtime
- Download
- Difficulty Rating
Loading Runtime
Materials from this workshop:
Train a Machine Learning Model using Scikit-Learn
Below is code that will load the Penguins dataset and immediately split 30% of the observations into a test
dataframe, and the remaining 70% of the observations into a train
dataframe.
Your task is to use these two datasets to train a Decision Tree algorithm using Scikit-Learn step-by-step.
species island bill_length_mm bill_depth_mm flipper_length_mm \
0 Adelie 2 39.1 18.7 181.0
1 Adelie 2 39.5 17.4 186.0
2 Adelie 2 40.3 18.0 195.0
3 Adelie 2 36.7 19.3 193.0
4 Adelie 2 39.3 20.6 190.0
body_mass_g sex
0 3750.0 0
1 3800.0 1
2 3250.0 1
3 3450.0 1
4 3650.0 0
(103, 7)
(239, 7)
Divide the train
dataset into X_train
and y_train
- Select the
species
column from thetrain
dataframe and save the result to the variabley_train
. - Select the remaining columns from the
train
dataframe and save the result to the variableX_train
.
Please note the lowercase y
and capital X
in y_train
and X_train
, respectively.
Divide the test
dataset into X_test
and y_test
- Select the
species
column from thetest
dataframe and save the result to the variabley_test
. - Select the remaining columns from the
test
dataframe and save the result to the variableX_test
.
Please note the lowercase y
and capital X
in y_test
and X_test
, respectively.
Train (fit) a DecisionTreeClassifier
model
Refer back to the code that we wrote together during the workshop to do the following:
- Import the
DecisionTreeClassifier
fromsklearn.tree
- Store a
DecisionTreeClassifier
a variable calledmodel
. - Use the model's
.fit()
method to train the algorithm. You'll need to pass the algorithm your training dataX_train
andy_train
-in that specific order.
This three step process will be nearly identical for any Scikit-Learn algorithm that we choose to use.
Check the model's accuracy with using the X_test
and y_test
data.
- You can do this manually using the
.predict()
method and hten calculating the percentage of correct predictions- or -
- You can use the model's
.score()
method (this is much easier).
Please save the model's accuracy on the test data to a variable called dt_accuracy
.
STRETCH GOAL CHALLENGE!
Stretch Goal Challenges are "extra credit" tasks that weren't necessarily taught you during our live workshop but that ask you to use your own brain/research to go a little bit above and beyond all on your own.
Can you...
Train a different machine learning algorithm from Scikit-Learn on the same data? Try using a RandomForestClassifier
You'll follow exactly the same steps that we did for the Decision Tree Classifier.
Save your RandomForestClassifier's final accuracy to a variable called rf_accuracy
.