DATA TRANSFORMS

Concurrency, Parallelism & Performance

Chapter 12AdvancedPerformance

Orientation

Make It Fast On Purpose

When a job is too slow, the temptation is to immediately reach for threads or more machines. That is the wrong first move. Speed work has an order: measure to find the real bottleneck, fix it with the cheapest tool that fits, and only then consider concurrency or a cluster. Guessing wastes days optimizing code that was never slow.

The second thing to get straight is vocabulary, because two words get used interchangeably and cost people real time. Concurrency is dealing with many tasks by interleaving them on one worker; parallelism is doing many tasks at the same instant on many workers. They solve different problems, and choosing the wrong one is why "I added threads and nothing got faster" is such a common complaint.

This chapter gives you a decision you can defend: profile first, classify the work as I/O-bound or CPU-bound, then pick threads, processes, asyncio, or scaling out for a concrete reason, not a hunch.

Why data engineers care

Optimizing without measuring burns time on the wrong code, and applying the wrong concurrency model adds complexity with no speedup. Doing it in order is what makes a job faster instead of just more complicated.

Core mental model

Measure, then pick the smallest tool that fits the bottleneck. Concurrency is interleaving; parallelism is simultaneity; they are not the same.

Concurrency (1 worker, interleaved)

A▓ B░ A▓ B░ A▓ B░

One worker switches between tasks while they wait. Great when tasks mostly wait (I/O).

Parallelism (N workers, at once)

A▓▓▓▓▓▓
B▓▓▓▓▓▓

Multiple workers run truly simultaneously. Needed when tasks actually compute (CPU).

Concurrency is dealing with many things at once; parallelism is doing many things at once. Picking the wrong one is why "I added threads and it got no faster" happens.

Key terms
concurrency
Structuring work so many tasks make progress by interleaving, ideal when tasks wait.
parallelism
Running many tasks at the same instant on multiple cores, needed when tasks compute.
I/O-bound
Work limited by waiting (network, disk); the CPU is mostly idle.
CPU-bound
Work limited by computation; the CPU is the bottleneck.

Common mistake

Adding threads or multiprocessing before profiling.

You add complexity and bugs while the real bottleneck (often an algorithm or an I/O wait) is untouched.

Better habit

  • Profile before optimizing anything.
  • Classify the bottleneck as I/O-bound or CPU-bound first.
  • Pick the concurrency model from that classification, not from habit.
Interview note

Asked to speed up a job, lead with "first I would profile to find the bottleneck and check if it is I/O- or CPU-bound." Jumping straight to "use threads" is the junior answer.

The order that saves time

Measure -> fix the algorithm -> stream to cut memory -> add concurrency for the right bound -> scale out. Each step is cheaper than the next; stop as soon as it is fast enough.

Practice prompts

  • Define concurrency and parallelism in one sentence each.
  • List the speed-work steps in order and why each precedes the next.

Remember this

Speed is a process, not a reflex: measure first, classify the bottleneck, then choose concurrency or scale deliberately.