Learning level
Conceptual, logical, and physical levels, and the real, compounding cost of a model that was never designed on purpose.
Read and draw ER diagrams: entities, attributes, crow’s-foot notation, 1:1 / 1:M / M:N, and bridge tables.
Natural, surrogate, and composite keys, foreign keys, and referential integrity, where most real data bugs begin.
Declare the grain before you build, and avoid the mixed-grain disasters that silently corrupt every metric downstream.
Functional dependencies, 1NF through BCNF, update anomalies, and when to deliberately denormalize for reads.
Turn requirements into a normalized operational schema, with the why: write integrity, transactions, and constraints.
Why operational and analytical models differ, how to decide between them, and the reasoning behind each structure.
Learning level
The Kimball core of analytics modeling: facts measure events, dimensions give them context, built around a clear grain.
The classic tradeoff between a flat star and a normalized snowflake, and exactly when each one is the right call.
Additive, semi-additive, and non-additive measures; transaction, snapshot, and accumulating facts; factless and degenerate.
Track dimension history correctly, from overwrite to history rows to hybrids, the number-one modeling interview topic.
The three enterprise modeling philosophies compared, with a clear map of when each approach actually fits.
Hubs, links, and satellites: an auditable, scalable, parallel-loadable pattern for enterprise data warehouses.
Medallion (bronze/silver/gold), wide tables / One Big Table, dbt models, ELT, and the semantic / metrics layer.
Effective dating, bitemporal models, snapshots, and event sourcing, so you can ask what was true and when.
Learning level
Document, key-value, and wide-column stores: model by query, not by entity, and choose embedding vs referencing.
Nodes, edges, and properties (Neo4j-style): how to model when the relationships between things are the real data.
Golden records, entity resolution, and reference data: one trusted version of customers, products, and accounts.
Domain-oriented data products, ownership, and federated governance for modeling at organizational scale.
Model features for machine learning with point-in-time correctness, avoiding leakage and training/serving skew.
Versioning, backward/forward compatibility, lineage, ownership, and quality, so models can change without breaking consumers.
A repeatable requirements-to-model method, OLTP vs OLAP structural decisions with the why, worked examples, and interview narration.
Partitioning, scalability under traffic, insert/update/history load mechanics, late-arriving data, tradeoffs, and a full star + snowflake model with sample queries.