__STYLES__
Tools used in this project
Udemy Course Analytics

About this project

Submission for Onyx DataDNA dataset challenge as of January 2024

Business Case

Create an effective report with the Udemy courses dataset consisting of information about various online courses available on Udemy that will allow users to drill down and filter data quickly and intuitively to find answers to any question they might have and analyse data in all possible directions and dimensions. more at https://onyxdata.co.uk/data-dna-dataset-challenge/

The dataset

An excel file with 12 columns and 3678 rows, it includes information such as course title, URL, basic pricing information, number of subscribers, reviews, lectures, course level, content duration etc. This dataset provides a high level overview from a sample of Udemy's course offerings from 2011 to July 2017, enabling some degree of analysis and insights.

full data dictionnary:

course_id: A unique identifier for each course.course_title: The title of the course.url: URL of the course on Udemy.is_paid: Indicates whether the course is paid or free.price: The price of the course (if it's a paid course).num_subscribers: The number of subscribers for the course.num_reviews: The number of reviews the course has received.num_lectures: The number of lectures in the course.level: The level of the course (e.g., All Levels, Intermediate Level).content_duration: The duration of the course content in hours.published_timestamp: The date and time when the course was published.subject: The subject category of the course.

The process

  • ETL stage done within Power Query in Power BI, our concern at this point was to clean and validate our data by ensuring all columns are set at the correct data type, removing duplicates and unwanted columns/information. Thus column 'course_id' was removed from the dataset since it served no purpose (could had a use for joins but not in this context) and had relatively high cardinality, column 'published_timestamp' was reformed into 'release_date' since the time information was irrelevant to our analysis needs. At the end 3672 rows and 11 columns were loaded to our Fact_courses table.
  • Data modeling with a star schema approach, I created a dimension date table with DAX functions and a dimension subject table to link to our Fact table of courses to create a simple model as pictured below:undefined The latter stages included : defining metrics and calculations using DAX, designing the wireframe in Figma in accordance with the visuals we intended to use to tell our story.

The limitations

I was mainly concerned about the dataset's limitations in terms of time intelligence analysis. The metrics like subscribers, reviews, and lectures are not necessarily associated with the course's publication date. For example, a course published in 2012 with over 30,000 subscriptions could have gained additional subscriptions between its release date and the last update of the dataset in July 2017. Moreover, the dataset is also limited if you want to conduct a more comprehensive analysis of trends in Udemy's course offerings or assess the impact of marketing strategies.

Credits

All icons used in this project were downloaded free for use from this amazing website, Credits to their respective amazing creators. The images used for the subject slicer came from Freepik

Cover Photo by Scott Graham on Unsplash

Additional project images

Discussion and feedback(2 comments)
comment-733-avatar
Branislav Poljasevic
Branislav Poljasevic
10 months ago
Love the funnel analysis, Jacques.
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.