__STYLES__

(Using R) 📊 Analyzing the Rise and Fall of Programming Languages Using Stack Overflow Data

Tools used in this project
(Using R)  📊 Analyzing the Rise and Fall of Programming Languages Using Stack Overflow Data

About this project

📊 Analyzing the Rise and Fall of Programming Languages Using Stack Overflow Data using R

Proud to share my recent project where I delved into a decade's worth of Stack Overflow questions to discern the popularity trends of programming languages. Here's a brief glimpse:

🔹 Objective: Determine which programming languages are gaining traction and which ones are waning in terms of usage and popularity.

🔹 Data Source: Stack Overflow's open data from the Stack Exchange Data Explorer, comprising over 16M questions.

🔹 Key Findings:

R and Python are on a steady rise, with Python especially showing a significant upward trend.

JavaScript remains a popular choice, but traditional heavyweights like Java and C# are seeing a decline.

Tags related to newer tools and technologies like Angular and Node.js are seeing increased activity, indicating their growing relevance in the developer community.

🔹 Visualization Tools: Leveraged R's ggplot2 for comprehensive visualizations, helping in clear trend identification.

Here is the Project

Analyzing Rise and Fall of Programming Languages

How can we tell what programming languages and technologies are used by the most people? How about what languages are growing and which are shrinking, so that we can tell which are most worth investing time in?

One excellent source of data is Stack Overflow, a programming question and answer site with more than 16 million questions on programming topics. By measuring the number of questions about each technology, we can get an approximate sense of how many people are using it. We’re going to use open data from the Stack Exchange Data Explorer to examine the relative popularity of languages like R, Python, Java and Javascript have changed over time.

Each Stack Overflow question has a tag, which marks a question to describe its topic or technology. For instance, there’s a tag for languages like R or Python, and for packages like ggplot2 or pandas.

Our data consists of 4 columns of each tag in stackoverflow and number of this tag in each year and the total per year. and about 40k rows

##  [1] 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
length(unique(data$tag))
## [1] 4080

so we have the data from 2008 to 2018 for 10 years and we have 4080 unique tags.

Percentage of each Tag.

we are adding new column to show the percentage of each tag in each year.

data = mutate(data,number_percentage=(number/year_total))head(data)
##   year           tag number year_total number_percentage
## 1 2008     .htaccess     54      58390      9.248159e-04
## 2 2008          .net   5910      58390      1.012160e-01
## 3 2008      .net-2.0    289      58390      4.949478e-03
## 4 2008      .net-3.5    319      58390      5.463264e-03
## 5 2008      .net-4.0      6      58390      1.027573e-04
## 6 2008 .net-assembly      3      58390      5.137866e-05

R developing over time

R_over_time = filter(data,tag=="r")head(R_over_time)
##   year tag number year_total number_percentage
## 1 2008   r      8      58390      0.0001370098
## 2 2009   r    524     343868      0.0015238405
## 3 2010   r   2270     694391      0.0032690516
## 4 2011   r   5845    1200551      0.0048685978
## 5 2012   r  12221    1645404      0.0074273552
## 6 2013   r  22329    2060473      0.0108368321

as we can see R tags are increasing rabidly over years but let’s visualize it to have a closer look.

ggplot(R_over_time,aes(x=year,y=number_percentage)) + geom_line(color="blue")

undefinedas we can see that R language after is in continuous increase over the years so that show us that it is worth to learn and practice. so let’s see the ggplot2 and dplyr tags too

s_tags = c("dplyr","r","ggplot2")
s_over_time = filter(data,tag%in%s_tags)ggplot(s_over_time,aes(x=year,y=number_percentage  ,color=tag))+geom_line()

undefinedas we can see that ggplot2 and dplyr are growing their question are not many as R.

Top Tags over all the years

sorted_tags = arrange(summarise(group_by(data,tag),item_total= sum(number)),desc(item_total))head(sorted_tags)
## # A tibble: 6 × 2
##   tag        item_total
##   <chr>           <int>
## 1 javascript    1632049
## 2 java          1425961
## 3 c#            1217450
## 4 php           1204291
## 5 android       1110261
## 6 python         970768

so as we see that c# , java script and java are having most question through the history

s_tags = c("javascript","java","c#","php","android","python")
s_over_time = filter(data,tag %in% s_tags)ggplot(s_over_time,aes(x=year,y=number_percentage ,color=tag))+geom_line()+coord_cartesian(xlim = c(2008,2018), ylim = c(0,0.1))

undefinedas we can see that over the years Java script and python is increasing in question percentage every year ,while C# , android, php and Java are decreasing each year where that means that their usage are decreasing

s_tags = c("r","python","powerbi","excel")
s_over_time = filter(data,tag%in%s_tags)ggplot(s_over_time,aes(x=year,y=number_percentage  ,color=tag))+geom_line()

undefinedas we can see both R an Python are growing but python is more popular and is used more.

s_tags = c("r","python")
s_over_time = filter(data,tag%in%s_tags)ggplot(s_over_time,aes(x=year,y=number_percentage  ,fill=tag))+geom_bar(stat="identity", position="dodge", width=0.7) +
  theme_minimal()

undefinedBiggest Changes over the years

library(viridis)
## Warning: package 'viridis' was built under R version 4.2.3
## Loading required package: viridisLite
increases = data %>%
  group_by(tag)%>%
  summarize(change = number_percentage[which.max(year)]-number_percentage[which.min(year)])head(arrange(increases,desc(change)))
## # A tibble: 6 × 2
##   tag        change
##   <chr>       <dbl>
## 1 python     0.0633
## 2 javascript 0.0598
## 3 android    0.0597
## 4 angular    0.0283
## 5 r          0.0265
## 6 node.js    0.0261
tail(arrange(increases,desc(change)))
## # A tibble: 6 × 2
##   tag         change
##   <chr>        <dbl>
## 1 windows    -0.0206
## 2 sql-server -0.0213
## 3 c++        -0.0267
## 4 asp.net    -0.0555
## 5 c#         -0.0733
## 6 .net       -0.0929

as we can see here python has the highest increase happened ever and blender and convolution has the highest decreased and it make sense that we don’t know any thing about them

increased=c("python","javascript","android","angular","r","node.js")ggplot(filter(increases,tag%in%increased),aes(x=reorder(tag,change),y=change,fill=tag))+geom_col()+theme_minimal()+scale_fill_manual(values = viridis(6))

undefinedAt the end you need to study and be updated with every new tool or change so you could preserve you job and develop in it.

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.