DATA ARCHITECTURE

Star vs Snowflake Schemas

Chapter 09Dimensional & WarehouseSchema Design

Orientation

What You'll Master Here

Chapter 8 built a star with deliberately denormalized dimensions. This chapter examines the alternative, normalizing those dimensions into a snowflake, and gives you a clear, defensible rule for choosing between them. It is one of the most common dimensional-modeling questions, and most explanations online are vague. This one will not be.

You will see the exact same data modeled both ways, the query difference each shape produces, the real trade-offs (speed, storage, simplicity, maintenance), and the handful of situations where a snowflake genuinely earns its extra joins. You will also meet the fact constellation, multiple stars sharing conformed dimensions.

By the end you should be able to look at a dimension and decide, with reasons, whether to keep it flat (star) or normalize it (snowflake), and defend that choice in a design review.

Why it matters

Choosing star vs snowflake affects every query’s speed and every analyst’s experience. A wrong default (over-snowflaking) quietly slows the whole warehouse and frustrates users for no real gain.

Core mental model

Star = flat dimensions, fewer joins, some redundancy. Snowflake = normalized dimensions, more joins, less redundancy. Default to star.

Key terms
star schema
Fact surrounded by flat, denormalized dimensions; one join per dimension.
snowflake schema
A star whose dimensions are normalized into linked sub-tables; multiple join hops.
fact constellation
Several fact tables sharing conformed dimensions (a "galaxy" of stars).
normalization (of a dimension)
Splitting repeated dimension attributes into their own related tables.

Common mistake

Snowflaking every dimension because normalization "feels correct".

You add joins and complexity to every query for storage savings that rarely matter.

Better habit

  • Default to a star; snowflake only with a specific reason.
  • Judge the choice by query simplicity and speed, not tidiness.
  • Share conformed dimensions across stars into a constellation.
The big idea

In analytics, redundancy in a dimension is usually cheap and joins are expensive. That is why the star (more redundancy, fewer joins) is the default, the opposite instinct from OLTP.

How to study this chapter

We model one dimension, product with its category and brand, both ways, so the trade-off is concrete rather than abstract.

Practice prompts

  • Define star and snowflake in one sentence each.
  • State the default and the reason for it.

Remember this

Star and snowflake are the same data at different normalization levels; default to the star and snowflake only for specific, justified reasons.