D8LooP - Practise Data Engineering

Orientation

What You'll Master Here

Serialization is the moment Python values become bytes or text that another system must read. That boundary deserves the same care as validation or schema design.

This chapter covers CSV, JSON, NDJSON, compression, row-oriented formats, columnar formats, Parquet, Avro, ORC, partitioned layouts, and file manifests.

The standard-library examples are executable Python. The Parquet, Avro, and ORC sections explain the data engineering tradeoffs and where dedicated libraries or engines usually enter.

Why data engineers care

A perfectly normalized record can still fail downstream if Decimal, datetime, headers, compression, schema, or partition layout are serialized carelessly.

Core mental model

Serialization is an output contract: values, field names, ordering, encoding, compression, schema, and file evidence.

domain values

Decimal / datetime

serializer

policy

bytes or text

file payload

manifest

evidence

Key terms

serialization: Converting in-memory values into a file, stream, or wire representation.
row-oriented: Records stored one row at a time, common in CSV, JSON, and NDJSON.
columnar: Values stored by column, useful for analytical scans and compression.
manifest: File-level evidence such as path, format, row count, schema version, and status.

Common mistake

Treating file writing as a final afterthought.

Downstream jobs discover type and schema issues after the data lands.

Better habit

Choose format by consumer and access pattern.
Serialize Decimal and datetime deliberately.
Write a manifest for every produced file.

What to say

I would pick the format based on consumer, schema needs, row vs column access, compression, and partition layout, then emit a manifest with row counts and schema version.

Practice prompts

Choose a format for raw API events and explain why.
List manifest fields for a partitioned export.

Remember this

File format choice is data architecture, not a save-as detail.

Serialization & File Formats

What You'll Master Here