## Browse Livestreams

- Regularization Techniques for Linear Regression
- Linear Regression Modeling
- Intro to Gradient Descent
- Intro to Linear Regression
- Covariance and Correlation (Bivariate EDA)
- Data Visualizations for EDA (univariate)
- Intro to Exploratory Data Analysis (EDA)
- Math for Data Science
- Classifying Penguins with Decision Trees
- Supervised Learning - Classification vs Regression
- What is Machine Learning?

- Save
- Run All Cells
- Clear All Output
- Runtime
- Download
- Difficulty Rating

## Loading Runtime

### Materials from this workshop:

## Train a Machine Learning Model using Scikit-Learn

Below is code that will load the Penguins dataset and immediately split 30% of the observations into a `test`

dataframe, and the remaining 70% of the observations into a `train`

dataframe.

Your task is to use these two datasets to train a Decision Tree algorithm using Scikit-Learn step-by-step.

```
species island bill_length_mm bill_depth_mm flipper_length_mm \
0 Adelie 2 39.1 18.7 181.0
1 Adelie 2 39.5 17.4 186.0
2 Adelie 2 40.3 18.0 195.0
3 Adelie 2 36.7 19.3 193.0
4 Adelie 2 39.3 20.6 190.0
body_mass_g sex
0 3750.0 0
1 3800.0 1
2 3250.0 1
3 3450.0 1
4 3650.0 0
```

```
(103, 7)
```

```
(239, 7)
```

## Divide the `train`

dataset into `X_train`

and `y_train`

- Select the
`species`

column from the`train`

dataframe and save the result to the variable`y_train`

. - Select the remaining columns from the
`train`

dataframe and save the result to the variable`X_train`

.

**Please note the lowercase y and capital X in y_train and X_train, respectively.**

## Divide the `test`

dataset into `X_test`

and `y_test`

- Select the
`species`

column from the`test`

dataframe and save the result to the variable`y_test`

. - Select the remaining columns from the
`test`

dataframe and save the result to the variable`X_test`

.

**Please note the lowercase y and capital X in y_test and X_test, respectively.**

## Train (fit) a `DecisionTreeClassifier`

model

Refer back to the code that we wrote together during the workshop to do the following:

- Import the
`DecisionTreeClassifier`

from`sklearn.tree`

- Store a
`DecisionTreeClassifier`

a variable called`model`

. - Use the model's
`.fit()`

method to train the algorithm. You'll need to pass the algorithm your training data`X_train`

and`y_train`

-in that specific order.

This three step process will be nearly identical for any Scikit-Learn algorithm that we choose to use.

## Check the model's accuracy with using the `X_test`

and `y_test`

data.

- You can do this manually using the
`.predict()`

method and hten calculating the percentage of correct predictions`- or -`

- You can use the model's
`.score()`

method (this is much easier).

Please save the model's accuracy on the test data to a variable called `dt_accuracy`

.

## STRETCH GOAL CHALLENGE!

Stretch Goal Challenges are "extra credit" tasks that weren't necessarily taught you during our live workshop but that ask you to use your own brain/research to go a little bit above and beyond all on your own.

Can you...

Train a *different* machine learning algorithm from Scikit-Learn on the same data? Try using a `RandomForestClassifier`

You'll follow exactly the same steps that we did for the Decision Tree Classifier.

Save your RandomForestClassifier's final accuracy to a variable called `rf_accuracy`

.