Learning level
Think in records, transforms, contracts, failure paths, and reviewable code instead of isolated syntax drills.
Lists, dicts, sets, tuples, grouping, lookup maps, dedupe keys, and nested records used in real transforms.
Clean, parse, and normalize messy text: encodings, whitespace, casing, delimiters, and regex for real-world fields.
Read files and payloads safely with pathlib, CSV, JSON, NDJSON, encodings, manifests, and malformed-row handling.
Handle None, datetime values, time zones, Decimal math, rounding, optional fields, and type hints deliberately.
Learning level
Structure transform code with small functions, modules, config objects, environment rules, and dependency boundaries.
Process data that does not fit in memory using lazy evaluation, yield, itertools, and streaming file pipelines.
Use map/filter/reduce-style flows, in-memory joins, aggregations, stateful scans, and sorted-window logic.
Move data between CSV, JSON, Parquet, Avro, and ORC: row vs columnar, compression, and schema-aware reads/writes.
Design required-field checks, schema validation, rejected-row reports, warnings, exceptions, and contract evidence.
Learning level
Pull from APIs and databases with pagination, retries, rate limits, secrets boundaries, and idempotent source reads.
Choose threads, processes, or asyncio with the GIL in mind, then profile and tune before reaching for a cluster.
Use pytest-style thinking, fixtures, golden outputs, logging, metrics, and traceable failures for production confidence.
Build cursor, watermark, replay-window, checkpoint, manifest, and task-boundary habits for rerunnable jobs.
Know when to move from pure Python to pandas, Polars, or Spark, and how to narrate those tradeoffs in interviews.