QUERY SYSTEMS

Joins & Relationship Reasoning

Chapter 02EasyFoundations

Orientation

What You'll Master Here

Real data almost never lives in one table. A customer, their orders, and the items inside each order are stored separately and connected by keys. A join is how you put them back together to answer a question.

This chapter teaches joins as relationship reasoning, not syntax memorisation: pick the right join type, predict how many rows come back, recognise when a join quietly multiplies or drops rows, and prove your result with counts.

We stay on the same marketplace dataset from Chapter 1 — customers, orders, and order_items — so your attention is on how the tables relate, not on a new business domain.

By the end you should be able to look at two tables, name the relationship between them, choose INNER vs LEFT deliberately, and explain in plain language why the row count is what it is.

Why data engineers care

Nearly every metric a data engineer ships requires a join. The expensive mistakes are silent: an INNER JOIN that drops valid rows, or a one-to-many join that doubles revenue while still looking plausible.

Core mental model

A join answers one question: for each row on one side, which rows on the other side relate to it — and what should happen when none do?

A join adds related columns from another table
Input data
orders4 rows
order_idbuyer_idtotal_amount
1001180.00
10023120.00
1003440.00
10041220.00

buyer_id is the foreign key pointing at customers.customer_id.

customers3 rows
customer_idcountry
1US
3GB
4IN
select
  o.order_id,
  o.total_amount,
  c.country as buyer_country
from orders as o
join customers as c
  on c.customer_id = o.buyer_id
order by o.order_id
limit 4;
Query result4 rows
order_idtotal_amountbuyer_country
100180.00US
1002120.00GB
100340.00IN
1004220.00US

orders.buyer_id points at customers.customer_id; the join follows that pointer to attach country.

Each order keeps its own columns and gains the buyer’s country from the related customers row.

customers

one row per customer

key: customer_id

orders

one row per order

key: order_id (buyer_id → customers)

order_items

one row per item line

key: order_id + sku (order_id → orders)

One customer has many orders; one order has many item lines. The foreign key always lives on the “many” side: orders.buyer_id points back to customers, order_items.order_id points back to orders.

Key terms
join
Combining rows from two tables based on a matching condition (usually a key).
key
The column(s) you match on, e.g. customer_id linking orders back to customers.
cardinality
How many rows on one side relate to a row on the other: one-to-one, one-to-many, or many-to-many.
match
A pair of rows whose join condition is true. Rows with no match are the heart of LEFT/anti joins.

Common mistake

Thinking a join only adds columns.

A join can also add rows (one-to-many) or remove rows (INNER with no match). It changes the row set, not just the width.

Better habit

  • Name the relationship before writing the join.
  • Predict the row count before you run it.
  • Decide what should happen to rows that have no match.
Interview note

Strong join answers start with the relationship: “Each order belongs to one customer, so I join orders to customers on buyer_id, and the order grain is preserved.”

Study tip

Use the topic menu on the left as a checklist. Each topic is a join habit you should be able to demonstrate, not just recognise.

Remember this

A join is relationship reasoning. Before syntax, know the relationship, the join key, and what happens when a row has no match.