D8LooP - Practise Data Engineering

Orientation

What You'll Master Here

Validation is how a Python data job proves that output rows are trustworthy before they reach a file, API, database, or warehouse.

This chapter turns earlier habits into a full contract system: required fields, field rules, record rules, batch rules, rejected rows, warnings, fatal errors, exception context, manifests, and tests.

The goal is not to make code noisy. The goal is to make failure paths explicit enough that bad data is stopped, explained, and measured.

Why data engineers care

Silent validation failures become bad metrics, unsafe upserts, broken partitions, and expensive incident investigations.

Core mental model

Validate shape, validate fields, validate records, validate the batch, then write only safe output plus evidence.

schema check

columns exist

field rules

types and ranges

record rules

row is coherent

batch rules

duplicates and totals

output

safe rows + evidence

Key terms

rejected row: A row excluded from accepted output with structured reason and source context.
warning: A non-blocking quality signal that is reported while the row may continue.
fatal error: A batch-level problem that should stop the run, such as missing required columns.
contract evidence: Counts and reports proving what validation accepted, rejected, warned, or stopped.

Common mistake

Catching every error and continuing without evidence.

The job appears resilient while silently losing correctness.

Better habit

Separate warning, rejection, and fatal paths.
Attach file, line, field, value, and reason to failures.
Test each validation rule with tiny examples.

What to say

I would validate schema first, then field and record rules, emit rejected rows with source context, stop on fatal batch errors, and write a manifest that reconciles counts.

Practice prompts

List which failures in an orders file should reject a row versus stop the run.
Design a rejected-row schema.

Remember this

Validation is not just defensive code. It is the data contract made executable.

Validation, Contracts & Error Handling

What You'll Master Here