__STYLES__
In one of the analytics focused classes we briefly used local real estate data when explaining regression modeling. I thought it would be a case for data analysis. The data was analyzed in RStudio using R.
This dataset is rich enough to find multiple layers of insight. So I expect to write multiple articles on this data.
The first step is to describe the market.
For data analysis I use the Tidyverse Library. The number of transactions is accessible with a simple row count with the count function. I refer to them as transactions instead of sales because this data include change of ownership through legal instruments such as wills and trusts.
I could summarize the prices individually with calls to the max, min, and mean functions. However, using summary gives all of these values.
The summary value also counts the number of NAs in the column. To find the number of transactions with a cost of 0 I filtered the transactions with a price of 0 and counted the result. I assigned the total number of sales, the number of sales with a price of 0, the number of sales with a price of NA to variables and used arithmetic to represent them as percents.
To find the average age of the house I used mutate to create an age column by subtracting the year built from 2023. Then I found used the mean function on the resulting column.
The analysis itself so far suggests the question of why there are so many transactions with no price? The most obvious possibility is that these transactions were transferred through trusts and wills. That would require more analysis.