Browse Livestreams
- Regularization Techniques for Linear Regression
 - Linear Regression Modeling
 - Intro to Gradient Descent
 - Intro to Linear Regression
 - Covariance and Correlation (Bivariate EDA)
 - Data Visualizations for EDA (univariate)
 - Intro to Exploratory Data Analysis (EDA)
 - Math for Data Science
 - Classifying Penguins with Decision Trees
 - Supervised Learning - Classification vs Regression
 - What is Machine Learning?
 

- Save
 - Run All Cells
 - Clear All Output
 - Runtime
 - Download
 - Difficulty Rating
 
Loading Runtime
Materials from this workshop:
Train a Machine Learning Model using Scikit-Learn
Below is code that will load the Penguins dataset and immediately split 30% of the observations into a test dataframe, and the remaining 70% of the observations into a train dataframe.
Your task is to use these two datasets to train a Decision Tree algorithm using Scikit-Learn step-by-step.
  species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
0  Adelie       2            39.1           18.7              181.0   
1  Adelie       2            39.5           17.4              186.0   
2  Adelie       2            40.3           18.0              195.0   
3  Adelie       2            36.7           19.3              193.0   
4  Adelie       2            39.3           20.6              190.0   
   body_mass_g  sex  
0       3750.0    0  
1       3800.0    1  
2       3250.0    1  
3       3450.0    1  
4       3650.0    0  (103, 7)
(239, 7)
Divide the train dataset into X_train and y_train
- Select the 
speciescolumn from thetraindataframe and save the result to the variabley_train. - Select the remaining columns from the 
traindataframe and save the result to the variableX_train. 
Please note the lowercase y and capital X in y_train and X_train, respectively.
Divide the test dataset into X_test and y_test
- Select the 
speciescolumn from thetestdataframe and save the result to the variabley_test. - Select the remaining columns from the 
testdataframe and save the result to the variableX_test. 
Please note the lowercase y and capital X in y_test and X_test, respectively.
Train (fit) a DecisionTreeClassifier model
Refer back to the code that we wrote together during the workshop to do the following:
- Import the 
DecisionTreeClassifierfromsklearn.tree - Store a 
DecisionTreeClassifiera variable calledmodel. - Use the model's 
.fit()method to train the algorithm. You'll need to pass the algorithm your training dataX_trainandy_train-in that specific order. 
This three step process will be nearly identical for any Scikit-Learn algorithm that we choose to use.
Check the model's accuracy with using the X_test and y_test data.
- You can do this manually using the 
.predict()method and hten calculating the percentage of correct predictions- or - - You can use the model's 
.score()method (this is much easier). 
Please save the model's accuracy on the test data to a variable called  dt_accuracy.
STRETCH GOAL CHALLENGE!
Stretch Goal Challenges are "extra credit" tasks that weren't necessarily taught you during our live workshop but that ask you to use your own brain/research to go a little bit above and beyond all on your own.
Can you...
Train a different machine learning algorithm from Scikit-Learn on the same data? Try using a RandomForestClassifier
You'll follow exactly the same steps that we did for the Decision Tree Classifier.
Save your RandomForestClassifier's final accuracy to a variable called rf_accuracy.