__STYLES__

Migrate Underlying Data to SSMS for Automated ETL w/ T-SQL

Tools used in this project
Migrate Underlying Data to SSMS for Automated ETL w/ T-SQL

About this project

Situation

Some time ago, I developed the Global Cost of Living dashboard, which was based on underlying data, mostly residing in Excel. To maintain this dashboard, twice a year, it is necessary to export the latest data from Numbeo.com. The export includes four tables: cities_indexes, cities_prices, country_indexes, and country_prices.

Complication

It is evident that appending the new data to a table in Excel and then running the refresh in Power BI is already a suboptimal process. Furthermore, to make the whole dashboard work, it is necessary to create a geographical lookup table that includes a list of distinct country-city pairs from the four data tables. If there are some countries that do not appear in this list, they need to be inserted as well, leaving the city column blank.

Since every time the Numbeo team updates their data, they might add a few countries or cities that were not listed earlier, the geographical lookup table needs to be dynamic. That is, every time something new is added to the data, it needs to be captured. Doing this in Excel every time is also not optimal and, above all, error-prone.

Solution

I decided to migrate these tables to Microsoft SQL Server Management Studio so that every time I need to upload the new data, I can just save the exports from Numbeo on my laptop, run a query in SSMS to append the new data to the existing tables, then refresh Power BI, and everything would be done.

Step-by-Step Process

First step: I created the database for the Global Cost of Living Project and the necessary tables in SSMS:

Query 1: Creation of the database and tables

CREATE DATABASE global_cost_living;
USE global_cost_living;

CREATE TABLE country_indexes (
    Country VARCHAR(255),
    Cost_of_Living_Index DECIMAL(8, 2),
    Rent_Index DECIMAL(8, 2),
    Cost_of_Living_Plus_Rent_Index DECIMAL(8, 2),
    Groceries_Index DECIMAL(8, 2),
    Restaurant_Price_Index DECIMAL(8, 2),
    Local_Purchasing_Power_Index DECIMAL(8, 2),
    Year INT
);

CREATE TABLE cities_indexes (
    Location_Full VARCHAR(255),
    City VARCHAR(255),
    Country VARCHAR(255),
    Cost_of_Living_Index DECIMAL(8, 2),
    Rent_Index DECIMAL(8, 2),
    Cost_of_Living_Plus_Rent_Index DECIMAL(8, 2),
    Groceries_Index DECIMAL(8, 2),
    Restaurant_Price_Index DECIMAL(8, 2),
    Local_Purchasing_Power_Index DECIMAL(8, 2),
    Year INT
);

CREATE TABLE cities_prices (
    Location_Full VARCHAR(255),
    City VARCHAR(255),
    Country VARCHAR(255),
    K_1 DECIMAL(8, 2),
    K_2 DECIMAL(8, 2),
    -- (Add more columns as needed)
    K_55 DECIMAL(8, 2),
    Year INT
);

CREATE TABLE countries_prices (
    Rank INT,
    Country VARCHAR(255),
    K_1 DECIMAL(8, 2),
    K_2 DECIMAL(8, 2),
    -- (Add more columns as needed)
    K_55 DECIMAL(8, 2),
    Year INT
);

Second step: I populated the tables with data from 2018 to the beginning of 2023:

Query 2: Example of code for importing data into a table

BULK INSERT dbo.cities_indexes
FROM 'C:\Users\stefa\OneDrive\Desktop\SQL\Global_Cost_Living DB\CSVs last update beginning 2023\Cities_Indexes.csv'
WITH
(
    FORMAT='CSV',
    FIRSTROW=2
);

Third step: I added a lookup table to another database that I maintain (useful lookup tables database) containing a list of all countries, their codes, regions, etc. This table doesn't change and is used to assign information to other dynamic geographical tables.

undefinedExtract of the Countries Master Lookup table

Fourth Step: I wrote the query to obtain the desired dynamic lookup table needed in my Power BI dashboard:

Query 3: Dynamic geographical lookup table, based on the four data tables

USE global_cost_living;

WITH city_country_pairs AS (
    SELECT DISTINCT 
        City,
        Country
    FROM (
        SELECT DISTINCT
            global_cost_living.dbo.cities_indexes.City,
            global_cost_living.dbo.cities_indexes.Country
        FROM
            global_cost_living.dbo.cities_indexes
        UNION
        SELECT DISTINCT
            global_cost_living.dbo.cities_prices.City,
            global_cost_living.dbo.cities_prices.Country
        FROM
            global_cost_living.dbo.cities_prices
    ) cities_union
),
countries_only AS (
    SELECT DISTINCT
        City,
        Country
    FROM (
        SELECT DISTINCT
            NULL as 'City',
            global_cost_living.dbo.country_indexes.Country
        FROM
            global_cost_living.dbo.country_indexes
        UNION
        SELECT DISTINCT
            NULL as 'City',
            global_cost_living.dbo.countries_prices.Country
        FROM
            global_cost_living.dbo.countries_prices
    ) countries_union
),
city_country_final AS (
    SELECT 
        City, 
        Country
    FROM 
        city_country_pairs
    UNION
    SELECT 
        City, 
        Country
    FROM 
        countries_only
    WHERE Country NOT IN (SELECT DISTINCT Country FROM city_country_pairs)
)
SELECT
    city_country_final.City,
    city_country_final.Country,
    useful_lookup_tables.dbo.Countries_Master_Lookup.[Country Code ],
    useful_lookup_tables.dbo.Countries_Master_Lookup.Region,
    CASE 
        WHEN city_country_final.City = 'Please Select a City' 
        THEN 'Please Select a Region' 
        ELSE useful_lookup_tables.dbo.Countries_Master_Lookup.[Detailed Region] 
    END AS 'Detailed Region'
FROM
    city_country_final
LEFT JOIN
    useful_lookup_tables.dbo.Countries_Master_Lookup 
        ON city_country_final.Country = useful_lookup_tables.dbo.Countries_Master_Lookup.Country;

Conclusion

Now, every time I need to update my dashboard after the Numbeo data releases, all I need to do is visit their website, export the four data tables as CSV, save them in a folder, and then run the BULK INSERT statement to append this new data to the tables in the SQL database. After that, Power BI needs to be refreshed, and the process is completed.

Discussion and feedback(0 comments)
2000 characters remaining