DATA ARCHITECTURE

Grain: What One Row Means

Chapter 04FoundationsGrain

Orientation

What You'll Master Here

If you remember one idea from this entire course, make it this one: the grain of a table is what a single row represents. It sounds almost too simple, and that is exactly why it is the most skipped and most expensive decision in data modeling.

Chapter 3 hinted at it: choosing the composite key (order_id, sku) was really choosing "one row per product per order." Here we make grain explicit, declare it before building, and learn to spot the two failures that wreck metrics: mixing grains in one table, and fan-out when you join tables of different grain.

Every claim is backed by a runnable example with the exact numbers, because grain bugs are silent: the query runs, returns a number, and the number is wrong. Seeing 3 become 4 and 240 become 320 is how the danger becomes real.

Why it matters

Grain decides what every count and sum means. Get it wrong and there is no error message, just inflated, untrustworthy numbers that someone eventually makes a decision on.

Core mental model

Grain is the answer to "what does one row mean?" State it as "one row per ___" before you build anything.

Key terms
grain
What a single row of a table represents (one order, one item, one day).
atomic grain
The lowest, most detailed level of a fact (one row per individual event).
fan-out
Row multiplication when joining a coarse-grain table to a finer-grain one.
mixed grain
A single table whose rows represent different things; a design error.

Common mistake

Building a table without saying its grain out loud first.

You discover the grain only when a metric looks wrong, after code and dashboards already depend on it.

Better habit

  • Write "one row per ___" before creating any table.
  • Confirm what count(*) means before trusting it.
  • Suspect grain first whenever a number looks too high.
The big idea

Almost every "the numbers are wrong" incident is a grain problem in disguise: counting or summing at a grain different from what the question assumed.

How to study this chapter

Watch the numbers in the examples. The same data gives 3 or 4, 240 or 320, depending only on grain. That gap is the entire lesson.

Practice prompts

  • State the grain of three tables you have used, as "one row per ___".
  • Describe a metric that would break if you misjudged the grain.

Remember this

Grain is what one row means; naming it explicitly before building is the single highest-leverage habit in data modeling.