Chapter 07
Orientation: SQL as a table-maintenance language
Analytical SQL asks a question and returns an answer. Data-engineering SQL keeps a warehouse table correct after thousands or millions of upstream changes arrive. That shift changes the job: the query is no longer just a report, it is a contract for how target state evolves.
This chapter treats SQL as the language of safe table maintenance. You will learn how to decide which rows are new, changed, unchanged, deleted, late, or historically important before a pipeline mutates a target table.
A source table is where new data arrives. A target table is the trusted warehouse table users query. A staging query sits between them: it cleans the source, removes bad keys, chooses one row per business key, and prepares an action plan.
A business key is the real-world identifier for one entity, such as `customer_id`. “One row per business key” means customer 104 should appear once in the target current table, even if the source sent customer 104 twice in one batch.
Why data engineers care
Production tables are reused by dashboards, ML features, finance workflows, and downstream models. A bad merge key, a missing replay window, or an accidental append-only model can silently corrupt every consumer.
Common mistake
Writing the mutation before proving the action plan.
You discover duplicate keys, tombstones, or changed attributes only after the target is already wrong.
Better habit
- Start with a SELECT that classifies actions per key.
- Count actions before mutating.
- Keep the source slice, target comparison, and final action labels in separate CTEs.
- Always explain at least one output row aloud: what source row met what target row, and why did that produce this action?
A senior load query is auditable: someone can read it and explain why each row will be inserted, updated, deleted, or ignored.
