QUERY SYSTEMS

Time, Events & Semi-Structured Data

Chapter 06MediumEvents

Orientation

What You'll Master Here

Chapter 6 is about the tables data engineers actually meet in pipelines: product events, timestamps, JSON payloads, arrays, and records that arrive after the day they belong to.

You will separate event time from ingestion time, bucket timestamps safely, extract semi-structured fields, expand arrays into rows, and audit ordering and lateness.

The live labs use SQLite-compatible JSON and date functions, while the article calls out where warehouse dialects differ — the mental model stays stable even when syntax changes.

Why data engineers care

Event tables power product analytics, attribution, alerting, and pipeline audits. Small timestamp or payload mistakes shift metrics silently, with no error to catch them.

Core mental model

Separate four things: what happened, when it happened, when you saw it, and which payload details need to become columns.

Inspect the event stream
Input data
events6 rows
event_iduser_idevent_nameevent_timeingested_at
11view2026-01-01 03:302026-01-01 03:31
21checkout2026-01-01 04:102026-01-01 04:12
32view2026-01-01 22:152026-01-02 08:00
42checkout2026-01-02 00:302026-01-02 00:31
53view2026-01-02 05:002026-01-04 09:00
61refund2026-01-03 18:002026-01-03 18:05

Events 3 and 5 were ingested a day or more after they happened — that lag is the whole subject of this chapter.

select
  event_id,
  event_name,
  event_time,
  ingested_at
from events
order by event_time, ingested_at;
Query result6 rows
event_idevent_nameevent_timeingested_at
1view2026-01-01 03:302026-01-01 03:31
2checkout2026-01-01 04:102026-01-01 04:12
3view2026-01-01 22:152026-01-02 08:00
4checkout2026-01-02 00:302026-01-02 00:31
5view2026-01-02 05:002026-01-04 09:00
6refund2026-01-03 18:002026-01-03 18:05

Six events across three users. Keep this stream in mind — every example below queries it.

Two timestamps per event: when it happened (event_time) and when the pipeline saw it (ingested_at). They are not always close.

event_time

when the user action happened

ingested_at

when the pipeline saw it

processed_at

when the warehouse transformed it

Key terms
event_time
When the user or system action actually happened.
ingested_at
When the pipeline received the event.
payload
Semi-structured details, often JSON, attached to an event.
late-arriving data
A record ingested after the reporting window for its event_time.

Common mistake

Treating event time, ingestion time, and processing time as the same time.

Late data, timezone boundaries, and backfills become impossible to reason about.

Better habit

  • Name which timestamp you are using.
  • Order events deterministically.
  • Decide how late-arriving data should be handled.
Production note

Most event bugs are not syntax bugs. They are semantic bugs about time, ordering, payload shape, or whether a record arrived too late for its reporting window.

Study tip

Use the topic menu as a checklist. Each topic is an event-data habit you should be able to demonstrate on the stream above.

Remember this

Event SQL is trustworthy when time semantics and payload extraction are visible in the query.