__STYLES__
Some time ago, I developed the Global Cost of Living dashboard, which was based on underlying data, mostly residing in Excel. To maintain this dashboard, twice a year, it is necessary to export the latest data from Numbeo.com. The export includes four tables: cities_indexes, cities_prices, country_indexes, and country_prices.
It is evident that appending the new data to a table in Excel and then running the refresh in Power BI is already a suboptimal process. Furthermore, to make the whole dashboard work, it is necessary to create a geographical lookup table that includes a list of distinct country-city pairs from the four data tables. If there are some countries that do not appear in this list, they need to be inserted as well, leaving the city column blank.
Since every time the Numbeo team updates their data, they might add a few countries or cities that were not listed earlier, the geographical lookup table needs to be dynamic. That is, every time something new is added to the data, it needs to be captured. Doing this in Excel every time is also not optimal and, above all, error-prone.
I decided to migrate these tables to Microsoft SQL Server Management Studio so that every time I need to upload the new data, I can just save the exports from Numbeo on my laptop, run a query in SSMS to append the new data to the existing tables, then refresh Power BI, and everything would be done.
First step: I created the database for the Global Cost of Living Project and the necessary tables in SSMS:
Query 1: Creation of the database and tables
CREATE DATABASE global_cost_living;
USE global_cost_living;
CREATE TABLE country_indexes (
Country VARCHAR(255),
Cost_of_Living_Index DECIMAL(8, 2),
Rent_Index DECIMAL(8, 2),
Cost_of_Living_Plus_Rent_Index DECIMAL(8, 2),
Groceries_Index DECIMAL(8, 2),
Restaurant_Price_Index DECIMAL(8, 2),
Local_Purchasing_Power_Index DECIMAL(8, 2),
Year INT
);
CREATE TABLE cities_indexes (
Location_Full VARCHAR(255),
City VARCHAR(255),
Country VARCHAR(255),
Cost_of_Living_Index DECIMAL(8, 2),
Rent_Index DECIMAL(8, 2),
Cost_of_Living_Plus_Rent_Index DECIMAL(8, 2),
Groceries_Index DECIMAL(8, 2),
Restaurant_Price_Index DECIMAL(8, 2),
Local_Purchasing_Power_Index DECIMAL(8, 2),
Year INT
);
CREATE TABLE cities_prices (
Location_Full VARCHAR(255),
City VARCHAR(255),
Country VARCHAR(255),
K_1 DECIMAL(8, 2),
K_2 DECIMAL(8, 2),
-- (Add more columns as needed)
K_55 DECIMAL(8, 2),
Year INT
);
CREATE TABLE countries_prices (
Rank INT,
Country VARCHAR(255),
K_1 DECIMAL(8, 2),
K_2 DECIMAL(8, 2),
-- (Add more columns as needed)
K_55 DECIMAL(8, 2),
Year INT
);
Second step: I populated the tables with data from 2018 to the beginning of 2023:
Query 2: Example of code for importing data into a table
BULK INSERT dbo.cities_indexes
FROM 'C:\Users\stefa\OneDrive\Desktop\SQL\Global_Cost_Living DB\CSVs last update beginning 2023\Cities_Indexes.csv'
WITH
(
FORMAT='CSV',
FIRSTROW=2
);
Third step: I added a lookup table to another database that I maintain (useful lookup tables database) containing a list of all countries, their codes, regions, etc. This table doesn't change and is used to assign information to other dynamic geographical tables.
Extract of the Countries Master Lookup table
Fourth Step: I wrote the query to obtain the desired dynamic lookup table needed in my Power BI dashboard:
Query 3: Dynamic geographical lookup table, based on the four data tables
USE global_cost_living;
WITH city_country_pairs AS (
SELECT DISTINCT
City,
Country
FROM (
SELECT DISTINCT
global_cost_living.dbo.cities_indexes.City,
global_cost_living.dbo.cities_indexes.Country
FROM
global_cost_living.dbo.cities_indexes
UNION
SELECT DISTINCT
global_cost_living.dbo.cities_prices.City,
global_cost_living.dbo.cities_prices.Country
FROM
global_cost_living.dbo.cities_prices
) cities_union
),
countries_only AS (
SELECT DISTINCT
City,
Country
FROM (
SELECT DISTINCT
NULL as 'City',
global_cost_living.dbo.country_indexes.Country
FROM
global_cost_living.dbo.country_indexes
UNION
SELECT DISTINCT
NULL as 'City',
global_cost_living.dbo.countries_prices.Country
FROM
global_cost_living.dbo.countries_prices
) countries_union
),
city_country_final AS (
SELECT
City,
Country
FROM
city_country_pairs
UNION
SELECT
City,
Country
FROM
countries_only
WHERE Country NOT IN (SELECT DISTINCT Country FROM city_country_pairs)
)
SELECT
city_country_final.City,
city_country_final.Country,
useful_lookup_tables.dbo.Countries_Master_Lookup.[Country Code ],
useful_lookup_tables.dbo.Countries_Master_Lookup.Region,
CASE
WHEN city_country_final.City = 'Please Select a City'
THEN 'Please Select a Region'
ELSE useful_lookup_tables.dbo.Countries_Master_Lookup.[Detailed Region]
END AS 'Detailed Region'
FROM
city_country_final
LEFT JOIN
useful_lookup_tables.dbo.Countries_Master_Lookup
ON city_country_final.Country = useful_lookup_tables.dbo.Countries_Master_Lookup.Country;
Conclusion
Now, every time I need to update my dashboard after the Numbeo data releases, all I need to do is visit their website, export the four data tables as CSV, save them in a folder, and then run the BULK INSERT statement to append this new data to the tables in the SQL database. After that, Power BI needs to be refreshed, and the process is completed.