__STYLES__
By importing these libraries, I ensured I had access to a wide range of tools and functions needed to explore and analyze the dataset, create visualization and build a predictive model to achieve the project objectives.
I used the 'pd.read_csv' function to read the data from the CSV file. The 'encoding='latin1' is used to handle encoding issues present in the data. To get a quick overview of the dataset, I displayed the first 5 rows.
See the screenshot below for the two steps:
This was to check for any missing data . There was no missing data so I proceeded to the next step.
Verifying the data types is essential to ensure that the data is suitable for intended analysis. See below:
With everything in order, I proceeded with the data analysis phase of my project.
I examined the distribution of impressions from home.
The impressions from Home section shows how much my posts reach my followers. I can say it's hard to reach all my followers daily.
Hashtags are used to categorize our posts on Instagram so that we can reach more people based on the kind of content we are creating. The above shows that not all posts can be reached using hashtags, but many new users can be reached from the hashtags.
The explore section, is the recommendation system on Instagram. It recommends posts to users based on their interests and preferences. Looking at the impressions I have received, I can say that Instagram does not recommend my posts to my users. Some posts have received a good reach from the explore section, but it's still very low compared to reach I receive from hashtags.
Observations: 44.1% reach is from my followers, 33.6% is from Hashtags, 19.2% is from explore section and 3.05% from other sources.
Observation: There is a linear relationship between number of likes and the reach I got from my posts.
Observation: Number of comments we get on a post does not affect its reach
Observation: The more number of shares will result to a higher reach. But it does affect the number of reach as much as likes do.
Observation: There is a linear relationship between the number of times my post is saved and the reach of the Instagram post.
Another critical step in data analysis is understanding the relationship between different variables in the dataset. In this step I calculated correlation between all columns and impressions.
Observation: More likes and saves will help you get more reach on Instagram and the number of shares will not affect your reach.
This means how many followers you are getting from the number of profile visits from a posts. I calculated it by FOLLOWS/PROFILE VISITS * 100.
Observation: My conversation rate is 41% which sounds like a very good conversation rate!
Observation: There is a linear relationship between profile visits and followers gained.
In this stage, I will train a machine learning model to predict the reach of an Instagram post. I first split the data into two sets i.e. a training set and test set. After splitting data, I trained a machine learning model to understand the relationship between the selected features and the target variable ('Impressions'). I then gave inputs into the model to make a prediction. See below, the codes I used:
Summary: This is how you can analyze and predict the reach of Instagram posts using python.