__STYLES__

Optimizing Marketing Investments: Unraveling the Impact of Promotional Budgets on Sales

Tools used in this project
Optimizing Marketing Investments: Unraveling the Impact of Promotional Budgets on Sales

About this project

Introduction

In this project, I conducted a comprehensive simple linear regression analysis to explore the relationship between two continuous variables. As part of an analytics team focused on marketing and sales insights, I worked on a project centered around influencer marketing. The primary objective was to investigate the relationship between marketing promotional budgets and sales. Our company's decision-makers rely on this analysis to inform future marketing strategies and investments, making it crucial to gain a clear understanding of the impact of different promotional channels on revenue generation.

This project allowed me to strengthen my knowledge of linear regression and hone my skills in evaluating regression results. The insights gained will be invaluable in providing data-driven business recommendations in the future.

Step 1: Imports and Data Loading

To begin the analysis, I imported the necessary Python libraries - pandas, matplotlib.pyplot, and seaborn - to manipulate and visualize the data. Additionally, I used the statsmodels library to build and fit the linear regression model.

Step 2: Data Exploration

I started with exploratory data analysis (EDA) to familiarize myself with the dataset and prepare it for modeling. The dataset consisted of the following features:

  • TV promotion budget (in millions of dollars)
  • Social media promotion budget (in millions of dollars)
  • Radio promotion budget (in millions of dollars)
  • Sales (in millions of dollars)

Each row represented an independent marketing promotion, with investments made in TV, social media, and radio promotions to boost sales. The primary aim was to identify the feature that most strongly predicted sales.

To achieve this, I performed the following steps:

  1. Calculated the number of rows and columns in the data.
  2. Generated descriptive statistics for the TV, radio, and social media promotion budgets.
  3. Explored the percentage of missing values in the sales column (which was negligible - 0.13%).
  4. Removed rows with missing sales data.

Next, I visualized the distribution of sales using a histogram, which showed that sales were evenly distributed between $25 million and $350 million.

Step 3: Model Building

In this step, I constructed a pairplot to visualize relationships between pairs of variables and identify the feature with the strongest linear relationship with sales. Based on the pairplot, I chose TV as the independent variable X for the simple linear regression model.

I then built and fitted the model using the statsmodels library. The linear equation for the model is:

Sales (in millions) = -0.1263 + 3.5614 * TV (in millions)

The R-squared value for the model was 0.999, indicating that 99.9% of the variation in sales could be explained by the TV promotional budget alone.

Step 4: Results and Evaluation

To evaluate the model, I checked the four assumptions of linear regression:

  1. Linearity: The scatterplot of TV against sales demonstrated a clear linear relationship, confirming the linearity assumption.
  2. Independence: Each marketing promotion in the dataset was independent, meeting the independence assumption.
  3. Normality: The histogram of residuals showed a normal distribution, supporting the normality assumption.
  4. Homoscedasticity: The scatterplot of fitted values against residuals displayed a consistent variance, satisfying the homoscedasticity assumption.

Interpreting the model results, I found that an increase of one million dollars in the TV promotional budget would lead to an estimated $3.5614 million increase in sales. The coefficient for TV had a p-value of 0.000, and its 95% confidence interval was [3.558, 3.565], indicating little uncertainty in the estimation.

Considerations

This project provided valuable insights into the relationship between marketing promotional budgets and sales. Key takeaways include the importance of EDA to identify suitable variables for regression, checking assumptions before interpreting results, and providing measures of uncertainty (p-values, confidence intervals) with coefficient estimates.

I would recommend to the leadership at our organization to prioritize increasing the TV promotional budget over other channels, as TV has the strongest positive linear relationship with sales. This decision is supported by the high R-squared value, low p-value, and narrow confidence interval for the TV coefficient, indicating high confidence in the impact of TV promotions on sales. Additionally, I would explore using both TV and radio as independent variables and create plots to visualize the results for better communication.

Overall, this project provided essential insights that will aid in making informed marketing investment decisions to drive revenue growth and maximize return on promotional spending.

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.