Running FFmpeg in Kubernetes looks easy. It's a binary; you put it in a container; you schedule the container as a Pod. Done.
It is easy — for the first few hundred jobs. Then come the questions: how do retries work when a Pod evicts mid-encode? How does a four-hour encode survive a node drain? How do you isolate one tenant's queue from another's without hard-coding pool names into your application? When does running FFmpeg as a Kubernetes Job give way to a worker pool, and when does the worker pool give way to a Kubernetes operator?
This post is the playbook. We've watched teams hit each transition the hard way; here's the shape of each pattern, the threshold where it stops working, and what to reach for next.
Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)
The simplest pattern: every encode is a kind: Job. Templated YAML, kubectl apply, FFmpeg runs to completion, Job goes to Succeeded (or Failed).
apiVersion: batch/v1
kind: Job
metadata:
name: encode-{{job_id}}
spec:
backoffLimit: 2
template:
spec:
restartPolicy: Never
containers:
- name: ffmpeg
image: registry.example.com/ffmpeg:6.1
args: ["-i", "{{input}}", "-c:v", "libx264", "{{output}}"]
What it gets right:
- Retry-on-failure semantics built in via Job's
backoffLimit - Failures are visible in
kubectl get jobs - One-shot is the right cardinality for a one-shot encode
Where it breaks:
- Job creation is rate-limited at the Kubernetes API server. Tens of thousands of Job objects bog down etcd.
- No queue: every encode dispatches immediately or not at all.
- Image-pull churn: every Job pulls the FFmpeg image fresh unless you've configured an image pull policy carefully.
- No fine-grained pool isolation: tenant A's burst saturates the cluster, tenant B's encodes starve.
Threshold: works fine up to ~500 encodes/day. Above that, Pattern 1.
Pattern 1: a worker Deployment + queue (works to ~50K/day)
Now you separate dispatch from execution. Encodes land on a queue (Redis, RabbitMQ, NATS, SQS — pick the one your team already operates). A kind: Deployment of long-lived FFmpeg worker pods pulls jobs and runs them serially.
apiVersion: apps/v1
kind: Deployment
metadata:
name: ffmpeg-workers
spec:
replicas: 8
template:
spec:
containers:
- name: worker
image: registry.example.com/ffmpeg-worker:1.4
env:
- name: QUEUE_URL
value: "redis://queue:6379/0"
What this fixes:
- No more etcd churn; one worker pod runs many encodes serially.
- The queue gives you backpressure for free.
- Retries become a property of the queue (visibility timeout, max-deliveries), not of Kubernetes.
What still breaks:
- Replica count is static. If your queue has 5K jobs in it, your eight replicas are going to take a while.
- You can manually scale via HPA on CPU/memory, but those metrics don't reflect queue depth — you're scaling on the symptom, not the cause.
- Hard tenant isolation requires running multiple Deployments by hand, with multiple queues, and routing logic in your application.
This pattern earns its keep up to about 50K encodes/day. Above that, Pattern 2.
Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)
KEDA is the Kubernetes Event-Driven Autoscaler — a controller that scales Deployments based on external metrics, including queue depth. Combine the queue from Pattern 1 with a ScaledObject and your worker count tracks job pressure directly.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: ffmpeg-workers
spec:
scaleTargetRef:
name: ffmpeg-workers
minReplicaCount: 0
maxReplicaCount: 200
triggers:
- type: redis
metadata:
address: redis://queue:6379
listName: encodes
listLength: "5"
What it fixes:
- Idle queue → workers scale to zero (or a minimum). Real cost savings for spiky workloads.
- Queue spike → workers scale up to the cap KEDA sees fit, in seconds.
- Reactive autoscaling on the right signal: "are there jobs to run?" not "is CPU pegged?"
What still breaks above ~500K/day:
- Multi-tenant isolation is still flat: one queue, one Deployment. The tenant noise problem returns at the queue level — a noisy tenant fills the queue ahead of a quiet one.
- GPU and CPU pools require separate Deployments, separate queues, manual routing logic in your application layer.
- The lifecycle of "draining a pool for an upgrade without dropping in-flight encodes" is something you build by hand, every time.
This is where the operator pattern earns its keep.
Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)
A Kubernetes operator is a controller that manages custom resources for your domain. For video transcoding, the operator manages a custom resource — call it WorkerPool — that represents a tenant + workload-shape combination.
apiVersion: video.mpegflow.com/v1
kind: WorkerPool
metadata:
name: tenant-acme-gpu
spec:
tenant: acme
workload: gpu-hevc
queueRef: redis://queue:6379/acme-gpu
scaling:
min: 0
max: 50
metric: queueDepth
resources:
requests:
nvidia.com/gpu: 1
What the operator gives you that hand-rolled patterns can't:
- One CRD per tenant or per workload class. The operator reconciles each into the right Deployment + ScaledObject + Service + RBAC scope. Adding a tenant becomes a CRD apply, not a YAML-template-and-pray exercise.
- Pool-level pause: cordon a
WorkerPoolto drain it for an upgrade without dropping in-flight encodes. Workers finish what they have, refuse new jobs, and the operator scales the pool down once empty. - Leader election: the operator runs in HA. No single point of failure for the control plane.
- Per-pool routing: jobs go to the queue named in the WorkerPool spec, so your application layer doesn't need to know about pools at all. The operator keeps the topology.
This is the pattern we ship in MpegFlow's K8s deployment. Full reference: MpegFlow on Kubernetes with KEDA and the strict-broker security model that complements it.
What FFmpeg leaves to you regardless of which pattern you pick
Kubernetes solves the placement problem — where this particular FFmpeg invocation runs. It does not solve:
- Stderr parsing. FFmpeg writes progress on stderr in a format Prometheus does not natively understand. You parse it.
- Partial-success handling. A six-rendition ABR ladder where rendition 4 OOM'd needs that one to retry on a higher-memory pool, not the whole job. Kubernetes Pod restart restarts the whole encode.
- Audit trail. Kubernetes logs the Pod stdout/stderr. It does not record encoder version, container hash, parameters, input/output hashes — the things your compliance officer asks about.
- Idempotency. Kubernetes'
Jobretry will run FFmpeg twice with the same arguments. If your output path is the same, you'll write twice. Deterministic output naming and atomic upload is on you.
These are the problems we wrote about in Running FFmpeg at scale: queue, retry, and the audit trail. Kubernetes is necessary for FFmpeg in production; it isn't sufficient.
The decision matrix
| Volume | Pattern | What you operate |
|---|---|---|
| <500 encodes/day | K8s Job per encode | YAML templating + cron |
| 500–50K/day | Worker Deployment + queue | + queue + retry logic |
| 50K–500K/day | KEDA queue-depth autoscaling | + ScaledObject manifests |
| 500K+/day, multi-tenant | Video transcoder K8s operator | + CRDs, leader election, pool routing |
Closing
If you have FFmpeg running in Kubernetes today, you are somewhere on this ladder. Knowing which rung you're on is half the battle; knowing which one comes next is the other half.
The operator pattern looks like a lot of moving parts when you read about it, but the alternative is hand-rolling each of its responsibilities into your application code, where they don't belong. Make Kubernetes do the Kubernetes work; make FFmpeg do the FFmpeg work; build a thin layer between them that is its own thing — and worth its own product surface.
If you want to skip the climb, that's what MpegFlow is. The operator, the queue topology, the audit layer, the strict-broker security pattern — pre-built, with FFmpeg invocations modeled as DAG stages. We're running a design partner program for broadcast and OTT teams that want to deploy ahead of GA.