The use of absolute numbers doesn't always lead to the right decisions, prompting a common practice of scaling them. For instance, consider the count of sunburn cases in summer. When fewer people swim, this count decreases; with more people enjoying the sun, it rises. Just observing this number might suggest an increased risk of sunburn, but in reality, the risk remains the same when scaled relative to the number of sunbathers. Gaining insights on this requires relative values, which aren't always easily obtainable as they're tied to the number of people involved.

Populations are not static; they change over time. Comparing different states or cities reveals varying rates of change in population. Some areas may see minimal change, while others might experience substantial increases. Dividing past cases by a higher population than the actual one in this year results in an underestimation of the original values, introducing a deviation in the data.

Table with populations when all years are selected.

To solve this issue, thre are **three potential methods** for scaling, assuming one knows the cases per year. And last but not least in addition the best practice - in my opinion.

One approach involves inputting the population data for each year in my case, the anual population. Then, in calculations, the average number of people within the chosen period is used. However, this method isn't entirely accurate.

To achieve high precision, weighted ratios would be necessary. This process entails calculating the ratio for each year and multiplying it by the reported cases. This calculation must be performed for every year within the selected period. Subsequently, these values are summed and divided by the total reported cases.

However, due to its time-intensive nature, I opted for a closer assumption, which involves using the average of the selected period. This approach was then compared to the time-saving alternatives.

This method is the "quick and dirty" solution, involving pulling in the population data for a single year—in my case, the population of 2022. The approach entails taking all reported cases and dividing them by the population of 2022. However, a significant caveat arises here. In my scenario, relying solely on the most recent population figures led to less accurate results, particularly for earlier years like 2004, with deviations reaching around 39%.

I expanded on this method by calculating the average between the population at the start and end points of the period. When observing a specific year, like 2004 as depicted in the image above, this approach yields better results than employing the population of 2022. However, it falls short of the accuracy achieved by using the precise average population for that specific year, which is the population of 2004 in this instance.

With multiple years selected, the results improve gradually. The deviation lessens as more years are taken into account. and as states are affected, that had small incremental changes.

So,I hope you can see, that it is not that easy to normalize values to populations. At least it is not done as fast as you may think in the first approach as the most accurate solution would be pulling in the annual population for each state.

But what would be an experiment with the best solution, that is both? **Accurate enough and time saving?**

For this one you also just have to pull in the populations of 2002 and 2022 and then you adjust the population for the selected years. So you make an linear interpolation between the year 2002 and 2022. Technically speaking this is not as eassy as it sounds, because by selecting more than one year, you first need to calculate an average year.

Besides of that you need:

```
Population 2002 = CALCULATE
(SUM(Population[Population]),
Dates[Year]=2002)
```

```
Population 2002 keepfiltrs =
CALCULATE(
Population[Population 2002],
KEEPFILTERS(
Population[Year] = 2002
)
)
```

The same has to be done with the year 2022.

And in my case the formula for adjusting the population looks like that:

```
Adj Pop = Population[Population 2002 keepfiltrs]+((Population[Population 2022 keepfiltrs]-Population[Population 2002 keepfiltrs])/20*([Average Year]-2002))
```

As the poluation is growing more or less linear, for me a linear interpolation was enough. If you can see that your sample is not growing linear, you also need to adapt the interpolation.

By using this formula I get a population that is pretty one to the acutal annual population:

So I hope this was helpful! And to get more into the issue, you can use the published Power BI dashboard.

Best, Jasmin

*Population Data taken from Census.gov*

Discussion and feedback(2 comments)

comment-749-avatar

Chris Dutton

5 months ago2000 characters remaining