Self-Paced Course
Data Science in Python: Data Prep & EDA
Master the foundations of Python for data science, including project scoping, data gathering & cleaning, EDA,feature engineering, and more.
Course Description
This is a hands-on, project-based course designed to help you master the core building blocks of Python for data science.
We'll start by introducing the fields of data science and machine learning, discussing the difference between supervised and unsupervised learning, and reviewing the data science workflow we'll be using throughout the course.
From there we'll do a deep dive into the data prep & EDA steps of the workflow. You'll learn how to scope a data science project, use Pandas to gather data from multiple sources and handle common data cleaning issues, and perform exploratory data analysis using techniques like filtering, grouping, and visualizing data.
Throughout the course you'll play the role of a Jr. Data Scientist for Maven Music, a streaming service that’s been struggling with customer churn. Using the skills you learn throughout the course, you'll use Python to gather, clean, and explore the data to provide insights about their customers.
Last but not least, you'll practice preparing data for machine learning models by joining multiple tables, adjusting row granularity, and engineering useful fields and features.
If you're an aspiring data scientist looking for an introduction to the world of machine learning with Python, this is the course for you.
Â
COURSE CONTENTS:
8.5 hours on-demand video
16 homework assignments
7 quizzes
2 projects (1 mid-course, 1 final)
2 skills assessments (1 benchmark, 1 final)
COURSE CURRICULUM:
- Welcome to the Course!
- Benchmark Assessment
- Course Introduction
- About This Series
- Course Structure & Outline
- DOWNLOAD: Course Resources
- Introducing the Course Project
- Setting Expectations
- Section Introduction
- What is Data Science?
- Data Science Skill Set
- What is Machine Learning?
- Common Machine Learning Algorithms
- Data Science Workflow
- Step 1: Scoping a Project
- Step 2: Gathering Data
- Step 3: Cleaning Data
- Step 4: Exploring Data
- Step 5: Modeling Data
- Step 6: Sharing Insights
- Data Prep & EDA
- Key Takeaways
- Section Introduction
- Project Scoping Steps
- Think Like an End User
- Brainstorm Problems
- Brainstorm Solutions
- Supervised vs Unsupervised Learning
- Identify Data Requirements
- Data Structures
- Model Features
- Data Sources
- Data Scope
- Summarize the Scope
- Key Takeaways
- Section Introduction
- Why Python?
- Installing Anaconda
- Launching Jupyter Notebook
- The Notebook Interface
- Edit vs Command Mode
- The Code Cell
- The Markdown Cell
- Helpful Resources
- Key Takeaways
- Section Introduction
- Data Gathering Process
- Data Sources
- Structured vs Unstructured Data
- The Pandas DataFrame
- Reading Flat Files
- DEMO: Reading Flat Files
- Reading Excel Files
- Connecting to a SQL Database
- Quickly Exploring a DataFrame
- ASSIGNMENT: Gathering Data
- SOLUTION: Gathering Data
- Key Takeaways
- Introduction
- Data Cleaning Overview
- Data Types
- Converting to DateTime
- Converting to Numeric
- DEMO: Converting Data Types
- ASSIGNMENT: Converting Data Types
- SOLUTION: Converting Data Types
- Data Issues Overview
- Finding Missing Data
- DEMO: Finding Missing Data
- Handling Missing Data
- Removing Missing Data
- Imputing Missing Data
- Resolving Missing Data
- ASSIGNMENT: Missing Data
- SOLUTION: Missing Data
- Finding Inconsistent Text & Typos
- Handling Inconsistent Text & Typos
- Updating Values Based on a Logical Condition
- Mapping Values
- Cleaning Text
- ASSIGNMENT: Inconsistent Text & Typos
- SOLUTION: Inconsistent Text & Typos
- Finding Duplicate Data
- Handling Duplicate Data
- ASSIGNMENT: Duplicate Data
- SOLUTION: Duplicate Data
- Finding Outliers
- Histograms
- Box Plots
- Standard Deviation
- Handling Outliers
- DEMO: Review Cleaned Data
- ASSIGNMENT: Outliers
- SOLUTION: Outliers
- Creating New Columns
- Creating Numeric Columns
- DEMO: Creating Numeric Columns
- ASSIGNMENT: Creating Numeric Columns
- SOLUTION: Creating Numeric Columns
- Creating DateTime Columns
- DEMO: Creating DateTime Columns
- ASSIGNMENT: Creating DateTime Columns
- SOLUTION: Creating DateTime Columns
- Creating Text Columns
- DEMO: Creating Text Columns
- ASSIGNMENT: Creating Text Columns
- SOLUTION: Creating Text Columns
- Key Takeaways
- Introduction
- Exploratory Data Analysis Overview
- Filtering
- DEMO: Filtering
- Sorting
- DEMO: Sorting
- Grouping
- DEMO: Grouping
- ASSIGNMENT: Exploring Data
- SOLUTION: Exploring Data
- Data Visualization Overview
- Data Visualization with Pandas
- DEMO: Data Visualization with Pandas
- Pair Plots
- DEMO: Pair Plots
- Distributions
- DEMO: Distributions
- Common Distributions
- The Normal Distribution
- ASSIGNMENT: Distributions
- SOLUTION: Distributions
- Scatter Plots
- DEMO: Scatter Plots
- Correlations
- DEMO: Correlations
- ASSIGNMENT: Correlations
- SOLUTION: Correlations
- Data Visualization in Practice
- EDA Tips
- Key Takeaways
- Mid-Course Project Overview
- SOLUTION: Exploring Data
- SOLUTION: Creating New Columns
- SOLUTION: Visualizing Data
- Introduction
- Case Study: Preparing for Modeling
- Data Prep for EDA vs Modeling
- Model Preparation Steps
- Creating a Single Table
- Appending
- DEMO: Appending
- Joining
- DEMO: Joining
- Types of Joins
- DEMO: Types of Joins
- DEMO: Creating a Single Table
- ASSIGNMENT: Creating a Single Table
- SOLUTION: Creating a Single Table
- Preparing Rows for Modeling
- DEMO: Preparing Rows for Modeling
- ASSIGNMENT: Preparing Rows for Modeling
- SOLUTION: Preparing Rows for Modeling
- Preparing Columns for Modeling
- Dummy Variables
- DEMO: Dummy Variables
- Preparing DateTime Columns
- DEMO: Preparing DateTime Columns
- ASSIGNMENT: Preparing Columns for Modeling
- SOLUTION: Preparing Columns for Modeling
- Feature Engineering
- Feature Transformations
- Feature Scaling
- Proxy Variables
- Feature Engineering Tips
- ASSIGNMENT: Feature Engineering
- SOLUTION: Feature Engineering
- PREVIEW: Applying Algorithms
- Key Takeaways
- Final Project Overview
- SOLUTION: Gathering Data
- SOLUTION: Cleaning Data
- SOLUTION: EDA
- SOLUTION: Preparing for Modeling
- Final Assessment
- Course Feedback Survey
- Share the love!
- Next Steps
WHO SHOULD TAKE THIS COURSE?
Data analysts or BI experts looking to transition into a data science role
Python users who want to build the core skills required before applying for Machine Learning models
Anyone interested in learning one of the most popular open source programming languages in the world
WHAT ARE THE COURSE REQUIREMENTS?
- Jupyter Notebooks (free download, we'll walk through the install)
- Familiarity with base Python and Pandas is recommended, but not required
Start learning for FREE, no credit card required!
Every subscription includes access to the following course materials
- Interactive Project files
- Downloadable e-books
- Graded quizzes and assessments
- 1-on-1 Expert support
- 100% satisfaction guarantee
- Verified credentials & accredited badges
Ready to become a
data rockstar?
Start learning for free, no credit card required!