__STYLES__

Minneapolis/St. Paul Housing Prices: 1970-2021

Tools used in this project
Minneapolis/St. Paul Housing Prices: 1970-2021

Housing Prices

About this project

The original Housing Sales dataset from DoltHub contained over 106 million rows of data. The total number of rows for the state of Minnesota was approximately 1% of the full dataset (1.1 million rows) before data cleaning.

My data cleaning and analysis included the following key steps:

  1. Ran DISTINCT COUNT query on Property Types column and determined there were 872 Property Types in the dataset.

  2. Ran COUNT query on the Property Types and sorted in Descending order to determine which Property Types had the highest number of home sales. Then used this data to determine which Property Types were relevant for my analysis (e.g. Single Family Homes, Townhomes, and Condos) and which could be omitted (e.g. apartments, businesses, public buildings such as schools, etc.). I included 58 Property Types that were various types of Single Family Homes, Townhomes, and/or Condos.

  3. Since the sample size of home sales from 1900 to 1970 in the dataset was very low (in some cases, no sales or less than 10 sales/year) and many rows did not contain prices, I determined that this data would not provide the same level of data quality and meaningful insights as home sales after 1970 and focused my analysis on this time period.

  4. Filtered out all sales with Number of Units in the property > 1.

  5. Combined all of the criteria above into a SQL query to obtain the data that I needed and exported it into a CSV file, which I then imported into Power BI. (See end of this page for the full SQL query.)

  6. Created "Property Type Lookup Table" and assigned a Home Type (Single Family Home, Townhome, or Condo) to each Property Type to consolidate the 872 Property Types into 3 Home Types.

  7. Cleaned County names in Power Query. For example, I replaced all instances of "Ramsey County" and "Ramsay" (incorrect spelling) with "Ramsey."

  8. Ran "Trim" and "Capitalize Each Word" functions in Power Query to remove leading and trailing spaces and for consistency in capitalization of addresses.

  9. Created Date Lookup Table to allow for more intuitive date filtering.

  10. Created calculated column to calculate age of homes when sold (Year Sold - Year Built). In a few cases, this value is less than 0 due to data quality errors in the dataset, so I replaced these values with NULL in Power Query.

NOTE: My SQL query to obtain and export the relevant data via DoltHub was the following:

"SELECT *

FROM sales

WHERE state='MN'

AND (building_num_units IS NULL OR building_num_units = 1)

AND DATE(sale_datetime) <> '1900-01-01'

AND property_type IN ('RESIDENTIAL SINGLE FAMILY-IMPROVED', '201 1A RESIDENTIAL SINGLE UNIT', 'CONDOMINIUM (ALSO MARKET RATE COOPERATIVE)', 'TOWNHOUSE', 'Condominium', 'RESIDENTIAL-TOWNHOUSE', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY TWO STORY', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY RAMBLER', '201 1A RESIDENTIAL SINGLE UNIT TOWNHOUSE TOWNHOME', 'TOWNHOMES-IMPROVED', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY SPLIT ENTRY', 'CONDOMINIUMS -IMPROVED', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY 1-2 STORY', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY 4 LEVEL SPLIT', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY 3 LEVEL SPLIT', 'Townhouse', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY 1 1/2 STORY', 'RESIDENTIAL SINGLE FAMILY-VACANT', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY 1 3/4 STORY', '201 1A RESIDENTIAL SINGLE UNIT CONDO CONDO', '101 2A AGRICULTURAL SINGLE-FAMILY RAMBLER', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY 1 1/4 STORY', '101 2A AGRICULTURAL SINGLE-FAMILY TWO STORY', '201 1A RESIDENTIAL SINGLE UNIT TOWNHOUSE TWO STORY', '101 2A AGRICULTURAL SINGLE-FAMILY 1 1/2 STORY', 'TOWNHOMES-VACANT', '101 2A AGRICULTURAL SINGLE-FAMILY 1-2 STORY', '101 2A AGRICULTURAL SINGLE-FAMILY 1 3/4 STORY', '201 1A RESIDENTIAL SINGLE UNIT TOWNHOUSE RAMBLER', '101 2A AGRICULTURAL SINGLE-FAMILY SPLIT ENTRY', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY LOG', '101 2A AGRICULTURAL SINGLE-FAMILY 1 1/4 STORY', '101 2A AGRICULTURAL SINGLE-FAMILY 4 LEVEL SPLIT', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY EARTHBERM', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY', '101 2A AGRICULTURAL SINGLE-FAMILY 3 LEVEL SPLIT', '201 1A RESIDENTIAL SINGLE UNIT TOWNHOUSE SPLIT ENTRY', '209', '100 RES 1 UNIT', '209 SINGLE-FAMILY 1 1/2 STORY', '209 SINGLE-FAMILY RAMBLER', '209 SINGLE-FAMILY TWO STORY', 'Apartment Condominium', 'REAL PROPERTY', 'RES 1 UNIT', 'RESIDENTIAL/RESIDENTIAL', 'Residential', 'RESIDENTIAL', '201 1A RESIDENTIAL SINGLE UNIT SINGLE-FAMILY RAMBLER', '201 1A RESIDENTIAL SINGLE UNIT TOWNHOUSE TOWNHOME', 'CONDOMINIUM', 'RESIDENTIAL LAKE SHORE', 'RESIDENTIAL-ZERO LOT LINE-DB', '206 4B(4) UNIMPROVED RESIDENTIAL LAND SINGLE-FAMILY TWO STORY', '206 4B(4) UNIMPROVED RESIDENTIAL LAND SINGLE-FAMILY RAMBLER', '206 4B(4) UNIMPROVED RESIDENTIAL LAND TOWNHOUSE TOWNHOME', '206 4B(4) UNIMPROVED RESIDENTIAL LAND SINGLE-FAMILY 1-2 STORY', '206 4B(4) UNIMPROVED RESIDENTIAL LAND SINGLE-FAMILY SPLIT ENTRY', '206 4B(4) UNIMPROVED RESIDENTIAL LAND SINGLE-FAMILY', '206 4B(4) UNIMPROVED RESIDENTIAL LAND TWO STORY')" > C:\Users\Joel\Downloads\FinalDataset.csv

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.