Orientation
What You'll Master Here
Data Vault 2.0 is the modeling pattern built for warehouses that must absorb many changing sources while staying fully auditable. Chapter 12 introduced it as one of three architectures; this chapter goes deep on how it actually works, because its mechanics are precise and frequently misunderstood.
You will learn its three building blocks, hubs (business keys), links (relationships), and satellites (descriptive history), and the loading rules that give it superpowers: hash keys, insert-only loads, hash diffs, and full source stamping. Then how the raw vault and business vault feed the dimensional marts users actually query.
Everything is shown with real DDL and sample rows, because the value of Data Vault, auditability and resilience to change, only makes sense once you see how a hub, link, and satellite are physically shaped and loaded.
Why it matters
Data Vault is increasingly common in regulated, multi-source enterprises and in modern ELT stacks. Knowing its mechanics lets you build, query, or evaluate one instead of treating it as a black box.
Core mental model
Separate keys (hubs), relationships (links), and context (satellites) so each loads independently, insert-only, fully audited, and never needs rewriting when a source changes.
- hub
- A table of unique business keys plus a hash key, load date, and source.
- link
- A table recording a relationship/association between hubs.
- satellite
- A table of descriptive attributes and their history, attached to a hub or link.
- hash key
- A hash of the business key used as the surrogate, enabling parallel, lookup-free loads.
Common mistake
Using Data Vault as the layer business users query directly.
It has many tables and joins; it is an integration core, not a consumption model, serve marts on top.
Better habit
- Separate keys, relationships, and context into hubs/links/satellites.
- Keep every table insert-only and source-stamped.
- Build dimensional marts on top for consumption.
Data Vault decomposes the model so the parts that change at different rates (keys vs relationships vs attributes) are stored separately. That separation is the source of its agility and auditability.
Track one entity, customer, through hub, satellite, and link. Seeing the same business key flow through all three makes the pattern click.
Practice prompts
- Name what a hub, link, and satellite each store.
- Explain why Data Vault is not the layer users query directly.
Remember this
Data Vault splits a model into hubs (keys), links (relationships), and satellites (history), loaded insert-only and source-stamped, for auditability and resilience to source change.
