Self-Paced Course
Data Science in Python: Classification
Master the foundations of classification modeling in Python, including KNN, logistic regression, decision trees, random forests, and gradient boosted machines
Course Description
This is a hands-on, project-based course designed to help you master the foundations for classification modeling in Python.
We’ll start by reviewing the data science workflow, discussing the primary goals & types of classification algorithms, and do a deep dive into the classification modeling steps we’ll be using throughout the course.
You’ll learn to perform exploratory data analysis, leverage feature engineering techniques like scaling, dummy variables, and binning, and prepare data for modeling by splitting it into train, test, and validation datasets.
From there, we’ll fit K-Nearest Neighbors & Logistic Regression models, and build an intuition for interpreting their coefficients and evaluating their performance using tools like confusion matrices and metrics like accuracy, precision, and recall. We’ll also cover techniques for modeling imbalanced data, including threshold tuning, sampling methods like oversampling & SMOTE, and adjusting class weights in the model cost function.
Throughout the course, you'll play the role of Data Scientist for the risk management department at Maven National Bank. Using the skills you learn throughout the course, you'll use Python to explore their data and build classification models to accurately determine which customers have high, medium, and low credit risk based on their profiles.
Last but not least, you'll learn to build and evaluate decision tree models for classification. You’ll fit, visualize, and fine tune these models using Python, then apply your knowledge to more advanced ensemble models like random forests and gradient boosted machines.
If you're an aspiring data scientist looking for an introduction to the world of classification modeling with Python, this is the course for you.
COURSE CONTENTS:
9.5 hours on-demand video
18 homework assignments
9 quizzes
2 projects
2 skills assessments (1 benchmark, 1 final)
COURSE CURRICULUM:
- About This Series
- Course Structure & Outline
- READ ME: Important Notes for New Students
- DOWNLOAD: Course Resources
- Introducing the Course Project
- Setting Expectations
- Jupyter Installation & Launch
- What is Data Science?
- The Data Science Skillset
- What is Machine Learning?
- Common Machine Learning Algorithms
- Data Science Workflow
- Data Prep & EDA Steps
- Modeling Steps
- Classification Modeling
- Key Takeaways
- Classification 101
- Goals of Classification
- Types of Classification
- Classification Modeling Workflow
- Key Takeaways
- EDA For Classification
- Defining a Target
- DEMO: Defining A Target
- Exploring The Target
- Exploring The Features
- DEMO: Exploring the Features
- ASSIGNMENT: Exploring the Target & Features
- SOLUTION: Exploring the Target & Features
- Correlation
- PRO TIP: Correlation Matrix
- DEMO: Correlation Matrix
- Feature-Target Relationships
- Feature-Feature Relationships
- Pro Tip: Pair Plots
- ASSIGNMENT: Exploring Relationships
- SOLUTION: Exploring Relationships
- Feature Engineering Overview
- Numeric Feature Engineering
- Dummy Variables
- Binning Categories
- DEMO: Feature Engineering
- Data Splitting
- Preparing Data For Modeling
- ASSIGNMENT: Preparing The Data For Modeling
- SOLUTION: Prepare The Data For Modeling
- Key Takeaways
- K-Nearest Neighbors
- The KNN Workflow
- KNN in Python
- Model Accuracy
- Confusion Matrix
- DEMO: Confusion Matrix
- ASSIGNMENT: Fitting A Simple KNN Model
- SOLUTION: Fitting A Simple KNN Model
- Hyperparameter Tuning
- Overfitting & Validation
- DEMO: Hyperparameter Tuning
- Hard vs. Soft Classification
- DEMO: Probability vs. Event Rate
- ASSIGNMENT: Tuning a KNN Model
- SOLUTION: Tuning a KNN Model
- Pros & Cons of KNN
- Key Takeaways
- Logistic Regression
- Logistic vs. Linear Regression
- The Logistic Function
- Likelihood
- Multiple Logistic Regression
- The Logistic Regression Workflow
- Logistic Regression in Python
- Interpreting Coefficients
- ASSIGNMENT: Logistic Regression
- SOLUTION: Logistic Regression
- Feature Engineering & Selection
- Regularization
- Tuning a Regularized Model
- DEMO: Regularized Logistic Regression
- ASSIGNMENT: Regularized Logistic Regression
- SOLUTION: Regularized Logistic Regression
- Multi-class Logistic Regression
- ASSIGNMENT: Multi-class Logistic Regression
- SOLUTION: Multi-class Logistic Regression
- Pros & Cons of Logistic Regression
- Key Takeaways
- Classification Metrics
- Accuracy, Precision & Recall
- DEMO: Accuracy, Precision & Recall
- PRO TIP: F1 Score
- ASSIGNMENT: Model Metrics
- SOLUTION: Model Metrics
- Soft Classification
- DEMO: Leveraging Soft Classification
- PRO TIP: Precision-Recall & F1 Curves
- DEMO: Plotting Precision-Recall & F1 Curves
- The ROC Curve & AUC
- DEMO: The ROC Curve & AUC
- Classification Metrics Recap
- ASSIGNMENT: Threshold Shifting
- SOLUTION: Threshold Shifting
- Multi-class Metrics
- Multi-class Metrics in Python
- ASSIGNMENT: Multi-class Metrics
- SOLUTION: Multi-class Metrics
- Key Takeaways
- Imbalanced Data
- Managing Imbalanced Data
- Threshold Shifting
- Sampling Strategies
- Oversampling
- Oversampling in Python
- DEMO: Oversampling
- SMOTE
- SMOTE in Python
- Undersampling
- Undersampling in Python
- ASSIGNMENT: Sampling Methods
- SOLUTION: Sampling Methods
- Changing Class Weights
- DEMO: Changing Class Weights
- ASSIGNMENT: Changing Class Weights
- SOLUTION: Changing Class Weights
- Imbalanced Data Recap
- Key Takeaways
- Project Brief
- Solution Walkthrough
- Decision Trees
- Entropy
- Decision Tree Predictions
- Decision Trees in Python
- DEMO: Decision Trees
- Feature Importance
- ASSIGNMENT: Decision Trees
- SOLUTION: Decision Trees
- Hyperparameter Tuning for Decision Trees
- DEMO: Hyperparameter Tuning
- ASSIGNMENT: Tuned Decision Tree
- SOLUTION: Tuned Decision Tree
- Pros & Cons of Decision Trees
- Key Takeaways
- Ensemble Models
- Simple Ensemble Models
- DEMO: Simple Ensemble Models
- ASSIGNMENT: Simple Ensemble Models
- SOLUTION: Simple Ensemble Models
- Random Forests
- Fitting Random Forests in Python
- Hyperparameter Tuning for Random Forests
- PRO TIP: Random Search
- Pros & Cons of Random Forests
- ASSIGNMENT: Random Forests
- SOLUTION: Random Forests
- Gradient Boosting
- Gradient Boosting in Python
- Hyperparameter Tuning for Gradient Boosting
- DEMO: Hyperparameter Tuning for Gradient Boosting
- Pros & Cons of Gradient Boosting
- ASSIGNMENT: Gradient Boosting
- SOLUTION: Gradient Boosting
- PRO TIP: SHAP Values
- DEMO: SHAP Values
- Key Takeaways
- Recap: Classification Models & Workflow
- Pros & Cons of Classification Models
- DEMO: Production Pipeline & Deployment
- Looking Ahead: Unsupervised Learning
- Project Brief
- Solution Walkthrough
WHO SHOULD TAKE THIS COURSE?
Data analysts or BI experts looking to transition into a data science role
Python users who want to build the core skills for applying classification models in Python
Anyone interested in learning one of the most popular open source programming languages in the world
WHAT ARE THE COURSE REQUIREMENTS?
- We strongly recommend taking our Data Prep & EDA and Regression courses first
- Jupyter Notebooks (free download, we'll walk through the install)
- Familiarity with base Python and Pandas is recommended, but not required
Start learning for FREE, no credit card required!
Every subscription includes access to the following course materials
- Interactive Project files
- Downloadable e-books
- Graded quizzes and assessments
- 1-on-1 Expert support
- 100% satisfaction guarantee
- Verified credentials & accredited badges
Ready to become a
data rockstar?
Start learning for free, no credit card required!