Expires in:
This article shows how to profile a zip code column in Excel and accurately count distinct records using COUNTBLANK, COUNT, COUNTIF, and a SUMPRODUCT trick. It is aimed at analysts and Excel users who need reliable data quality checks without relying on newer dynamic array functions. You will learn practical formulas, why they work, and how to handle common edge cases like blanks and numeric zip codes.

What You'll Learn / Key Take-Aways
- How to run basic data profiling checks in Excel: total rows, blanks, numeric values, and valid text records.
- How to count nonblank text records using COUNTIF with a wildcard.
- Why counting distinct values is tricky in classic Excel and how to build a reliable distinct count with SUMPRODUCT and COUNTIF.
- How to handle blanks and numeric-formatted zip codes so they do not distort distinct counts.
## What is data profiling and why it matters
Data profiling is the process of summarizing and understanding the quality, distribution, and characteristics of a data set. Typical profiling checks include row count, number of blanks, numeric versus text values, and distinct values. These checks are essential before reporting or analysis so you are confident the data is complete and formatted correctly.
## Workbook setup and the sample case
In the demo we use a small sample of 10 customer rows and focus on the zip code column. The instructor named the range F3:F12 as zips to make formulas easier to read. You can replicate this approach by naming the zip code range in your workbook.
Key sample columns and named range:
- zips = F3:F12
The goal is to compute:
1. Total rows
2. Distinct zip codes
3. Number of blanks
4. Number of numeric values (bad for zip codes)
5. Valid records (nonblank text)
## Quick profiling formulas
Use these simple built-in formulas first to get a baseline.
- Total rows: =ROWS(zips)
- ROWS counts every row in the reference regardless of content.
- Number of blanks: =COUNTBLANK(zips)
- COUNTBLANK returns how many cells are empty.
- Number of numeric values: =COUNT(zips)
- COUNT counts cells that contain numbers. For zip codes, numbers often indicate incorrect formatting.
- Valid text records (shortcut): =COUNTIF(zips,"*")
- The asterisk wildcard matches any nonblank text. COUNTIF with "*" counts cells that contain text of any length.
You can also compute valid records by subtraction: ROWS(zips) - COUNTBLANK(zips) - COUNT(zips), but COUNTIF with "*" is concise and readable.
## The distinct count problem in classic Excel
There is no built-in COUNTDISTINCT function in older Excel versions. Newer Office 365 users can use UNIQUE and COUNTA to get distinct counts easily, but classic Excel needs a formula workaround.
Challenge details:
- A naive SUMPRODUCT(COUNTIF(zips,zips)) returns an inflated total because duplicates contribute their full counts rather than a single unit each.
- Blank cells are converted to zeros in array operations, which changes how COUNTIF and arithmetic behave.
The trick is to transform the array so each instance of a repeating value contributes a fractional share 1/n, where n is the total count of that value. When you sum those fractions across the array, each distinct value contributes exactly 1.
## Build the distinct count formula step by step
1. Start with the COUNTIF array: COUNTIF(zips,zips)
- This yields an array like {1,0,2,1,2,1,0,1,0,1} where zeros may represent blank cells converted to numeric zero in the array evaluation.
2. Convert those counts to fractional contributions by dividing 1 by that COUNTIF array: 1/COUNTIF(zips,zips)
- For duplicates this yields values such as 1/2 for each of two duplicates.
- Zeros will produce divide by zero errors if left unhandled.
3. Coerce all items to text to prevent blanks becoming numeric zeros. Append an empty string to the criteria array by concatenating: zips & ""
- Using COUNTIF(zips, zips & "") ensures blanks evaluate to blank strings rather than zeros.
4. Multiply by a filter that excludes blanks: (zips <> "")
- This produces an array of TRUE/FALSE values. When multiplied in arithmetic, TRUE becomes 1 and FALSE becomes 0. This step excludes blank rows entirely if you do not want blanks counted as a distinct value.
5. Put it together inside SUMPRODUCT and get the final formula:
=SUMPRODUCT((1/COUNTIF(zips, zips & "")) * (zips <> ""))
Explanation:
- COUNTIF(zips, zips & "") produces an array of counts for each item but treats blanks as blank strings.
- 1/COUNTIF(...) gives fractional contributions for each instance of a repeated value.
- Multiplying by (zips <> "") converts blanks to zero so they are not counted as a distinct value.
- SUMPRODUCT sums the resulting array to yield the number of distinct nonblank values.
## Why the concatenation and logical filter are necessary
- Concatenation with "" forces Excel to treat values as text. Without it, blank cells can become numeric zeros in array contexts. Those zeros cause division errors and incorrect counts.
- The logical filter (zips <> "") excludes blanks when you do not want them to be counted as their own distinct value. If you do want blanks counted, omit that filter and SUMPRODUCT will return a count that includes blanks as a distinct value.
## Using Evaluate Formula to debug
If a formula returns a strange result, the Evaluate Formula tool is your friend. It shows intermediate arrays and how Excel is treating blanks, zeros, and text. When you step through the evaluation you can see how zeros or blank strings appear in the array and where div/0 errors may occur.
## Additional Tips, Pitfalls to Avoid & Pro Advice
- If you have Office 365, prefer dynamic array functions:
- =COUNTA(UNIQUE(zips)) counts distinct values including blanks.
- =COUNTA(UNIQUE(FILTER(zips,zips<>""))) counts distinct nonblank values.
- Keep zip codes formatted as text to preserve leading zeros. If you import data and see ZIPs like 7040 instead of 07040, convert or reformat them as text.
- For large datasets, SUMPRODUCT with COUNTIF can be slow. Consider using helper columns, a pivot table, or Power Query for scalability.
- Be explicit about what counts as a valid record. If a blank zip should be treated as invalid, filter it out. If blanks represent a distinct category you need to count, allow them.
- When concatenating to force text, do not add visible characters. Use "" to avoid changing the apparent cell content.
## FAQ
Q: How do I count distinct values in Excel without Office 365?
A: Use the SUMPRODUCT and COUNTIF trick shown above: =SUMPRODUCT((1/COUNTIF(range, range & "")) * (range <> "")). This returns distinct nonblank counts.
Q: Why are blank cells appearing as zeros in my array formulas?
A: In array evaluations, Excel can coerce blanks to numeric zeros. Concatenate an empty string (range & "") to force text so blanks show as blank strings instead.
Q: How do I count only text zip codes and exclude numeric entries?
A: Use COUNTIF(range, "*") to count text entries. Use COUNT(range) to count numeric entries and subtract as needed.
Q: Can I treat blanks as a distinct value when counting distinct items?
A: Yes. Omit the (range <> "") filter from the SUMPRODUCT formula. With the concatenation trick in place, blanks will count as a single distinct item.
Q: What is the fastest way to get distinct counts for large datasets?
A: For large data, use Power Query, pivot tables, or modern Excel functions (UNIQUE) if available. SUMPRODUCT+COUNTIF works but can be slow on tens or hundreds of thousands of rows.
## Summary of Key Points / Take-Home Messages
- Start profiling with simple counts: ROWS, COUNTBLANK, COUNT, and COUNTIF("*").
- Distinct counts require care because duplicates must contribute fractional shares to the total.
- Coerce values to text with & "" to avoid blanks becoming zeros in array operations.
- Use (range <> "") to exclude blanks from distinct counts when blanks are invalid.
- Prefer UNIQUE and FILTER on Office 365, or Power Query and pivot tables on large datasets.






