The first version of every video pipeline is a script. It downloads an input. It runs ffprobe. It runs ffmpeg once or twice. It uploads the outputs and fires a webhook. It's fifty lines of bash or two hundred lines of Python. It works.
The second version is the same script with retries and logging. The third version has if/else branches for live versus VOD. The fourth version is a state machine pretending to be a script — it's not really a script anymore, you just haven't admitted it yet. By the time it's the seventh version, you're maintaining a workflow engine that nobody calls a workflow engine, with all the bugs of one and none of the structure.
This post is the case for skipping that journey and modeling video pipelines as a directed acyclic graph from the start. It's not a free choice — DAGs come with their own costs — but for video specifically, the trade is good earlier than most teams think.
What scripts do well
Scripts deserve credit. For the first several months of a video product they are exactly the right shape:
- They're easy to read top-to-bottom
- They're easy to debug — you re-run them with different args and add
printstatements - They have no infrastructure dependencies — your code, FFmpeg, a directory
- New engineers ramp up in an hour
- The blast radius of a change is small
If your video volume is steady, your input formats are constrained, and your outputs are uniform, a script can run in production for years. There are real, profitable companies whose entire video pipeline is a Python script behind a queue. Don't let an architecture blog post convince you that you have a problem you don't have.
Where scripts break
The breakages aren't sudden. They're a gradual loss of the properties that made the script good.
Branching outputs. The first time someone asks you to encode the same input into eight different renditions plus subtitles plus a thumbnail strip, your script grows a for loop. The first time those renditions need different settings, the loop grows config. The first time one rendition needs to depend on the output of another (e.g., the thumbnail strip references the 720p rendition's keyframes), your script grows ordering logic. None of these things break the script — they just slowly turn it into something nobody can read in one sitting.
Partial failure. A script that produces ten outputs has ten places to fail. When one fails, what's the script's state? Most scripts answer with "rerun the whole thing." This is wasteful when the failed output was the cheap one. It's also wrong when the rerun produces different outputs for the renditions that succeeded the first time, because the encoder version drifted, or because non-determinism in the codec changed something subtle.
Retry semantics. We've written elsewhere about why retries on FFmpeg are not trivial. The TL;DR: different failure classes want different retry behaviors. Implementing this in a script means a tree of try / except / classify / re-run that swallows the script's clarity.
Provenance. "Why does this output have these encoder parameters?" The script knows, but the audit log doesn't. Adding observability to a script means writing log lines at every interesting boundary, by hand. Forgetting one is a silent bug — you find out three months later when a customer asks a question and the answer isn't in the database.
Composition. Two scripts that each work in isolation don't compose. If team A's pipeline produces output that's now an input for team B's pipeline, the obvious thing — calling A from B — turns into "B's script runs A's script, captures stdout, parses it" and the failure modes multiply.
None of these are showstoppers. All of them slowly erode the things scripts were good for: clarity, debuggability, ramp-up, blast radius.
What a DAG buys you
A directed acyclic graph models the pipeline as nodes (stages) connected by edges (data flow). Each stage takes typed inputs, produces typed outputs, and runs independently. The graph is declarative — you describe the relationships, not the order — and the engine figures out execution order from the dependencies.
The properties that fall out of this structure, almost for free:
Partial-failure handling becomes a graph problem. When stage 7 fails, the engine knows exactly which stages 7 depends on (don't rerun those) and which stages depend on 7 (skip until 7 succeeds). You don't write retry logic per stage — you write retry policy per stage type, and the engine routes failures.
The audit trail is the graph. Every edge is a data hand-off. Every node has timestamps, status, encoder version, parameters. Reconstructing "what happened to job X" is reading the graph, not parsing log files. Provenance stops being an afterthought and starts being the data structure.
Composition is the model. Stage A's output is Stage B's input by graph edge, not by string parsing. Two pipelines compose by connecting their boundaries — no special integration code, just edges.
Parallelism is automatic. The engine sees that stages 4, 5, and 6 don't depend on each other and runs them in parallel. You don't write threading code; you write the graph and the engine schedules.
Retries are localized. Stage 5 fails transiently. The engine retries just stage 5 — not the whole pipeline. This is a big deal for video specifically, because individual stages can take minutes-to-hours, and re-running everything from the top is genuinely expensive.
Observability is uniform. Every stage looks the same to the observer: input, output, timing, status, retries. Your dashboards work for any new pipeline you add, because the unit of observation is the same.
Notice none of these are about "modern architecture" or "decoupling." They're concrete operational properties that come from the structure.
What you give up
DAGs are not free. The honest list:
Indirection. A bug in production is no longer "read the script top-to-bottom." It's "look at the graph, find the failing node, look at its inputs, follow the edge backward." Engineers used to imperative debugging hate this for a few weeks before they don't.
Tooling tax. A script runs anywhere. A DAG needs a runtime. That runtime is a dependency you have to install, configure, monitor, and upgrade. If your video volume genuinely doesn't justify the runtime, the runtime is overhead.
Learning curve. Onboarding goes from "read this 200-line script" to "learn this graph language, learn the runtime, then read the graph." For a one-person team, this is a tax. For a four-person team, it's a sub-day investment that pays back in weeks.
Over-modeling temptation. Once you have a graph, every problem looks like a graph. The temptation to express everything as a DAG node — including things that don't need to be — is real. The discipline is "stage = something with measurable inputs, outputs, and failure semantics." Not "every line of code becomes a node."
If those costs read as obviously not-worth-it for your situation, stay on a script. We mean it.
What makes a video pipeline DAG different from a generic workflow engine
This is the part most "use a DAG" arguments get wrong. They assume that because Airflow, Temporal, Prefect, and Dagster exist, the answer to "I need a DAG" is "use one of those." For video, this is usually the wrong answer.
Generic workflow engines are designed for arbitrary computation. They optimize for:
- Long-running, mostly-IO-bound tasks (ETL, data movement)
- Heterogeneous workloads (Python, SQL, REST calls, all in one DAG)
- Operator authorship in user code
- Scheduling on cron-like triggers
- Scale measured in DAG runs per day, not per hour
Video has a different shape:
- Workloads are CPU- and GPU-bound, not IO-bound. Workers must be sized for compute, not parallel HTTP calls.
- Workloads are mostly the same shape. Encode, package, deliver. Mux/demux variants. The graph topology repeats with parameter variation, more than it diverges in structure.
- Failures have specific FFmpeg-flavored taxonomies. A general-purpose retry policy doesn't know that exit code 234 means "input contains an unsupported codec."
- Provenance is more than logging — it's auditable, often regulatory. Encoder version, container hash, preset version, all need to live on every artifact, not just in opt-in logs.
- Outputs are large binary blobs, not rows in a warehouse. Storage strategy, retention, partial-cleanup-on-failure, all matter at the framework level, not the user level.
You can build video on Airflow. People do. The cost is implementing the video-specific operational layer on top of a framework that doesn't know it's running video, and the result is a DAG runtime plus a parallel video-specific layer of conventions that the runtime can't enforce.
The case for a video-native DAG runtime is that the FFmpeg-flavored taxonomies, encoder version pinning, and large-binary partial-failure semantics are part of the framework instead of conventions on top of it.
What a video pipeline DAG looks like in practice
Concretely, a video pipeline DAG looks something like:
[ingest:s3] ──► [probe] ──┬─► [encode:240p] ──┐
├─► [encode:480p] ──┤
├─► [encode:720p] ──┼─► [package:hls] ──► [emit:cdn]
├─► [encode:1080p] ──┤
├─► [thumbnails] ──┘
└─► [captions:burn-in] ──► [emit:cdn]
The graph is declarative. You write "encode:1080p depends on probe; package:hls depends on all the encode:* outputs." The runtime decides:
proberuns first- The five encode and thumbnail stages run in parallel
package:hlswaits for the four encode outputsemit:cdnruns as the encodes and packaging settle- If
encode:480pfails transiently, only it retries —package:hlskeeps waiting - If
encode:480pfails permanently,package:hlsis skipped, the partial outputs are GC'd, and the customer gets a deterministic "rendition 480p failed:" event
You don't write any of that orchestration. You write the graph and the per-stage parameters.
The audit trail of a single job is the resolved graph: every node's start/end time, encoder version, parameters, output hash, retry history. To answer "why does this output look this way," you don't parse logs — you query the graph.
FFmpeg as a service, done well, is a DAG runtime
If this all sounds like a lot, that's because it is. The DAG runtime, the FFmpeg-aware retry classifier, the audit trail, the partial-success handling, the encoder version pinning, the output GC — none of these is small. Implementing them well is a multi-quarter project for a small team.
This is most of what we've spent time building at MpegFlow. The bet is that for teams whose business is video — broadcasters, OTT platforms, archive shops — the operational layer is too important to be a side project, and the right answer is to use FFmpeg as a service, not a script with retries bolted on. We built the DAG runtime first because every other guarantee — provenance, partial-success handling, version pinning — is easier when the structure is right.
If that's interesting, the beta cohort is open. The encoder MVP is shipping this quarter, and the DAG runtime is what runs underneath it.
If you want to keep going on the operational layer specifically — what FFmpeg-in-production demands once you commit to running it yourself — our previous post covers the queue, retry, and audit-trail design in detail.
And if your real question is "should I run any of this myself, or stay on a managed service," that's the next post. Honest math, no marketing.