FFmpeg in Kubernetes: the pod, queue, and operator pattern

MpegFlow

How to run FFmpeg in Kubernetes at production scale — the four patterns we've watched teams climb, where each breaks, and why a video transcoder Kubernetes operator earns its keep above ~50K jobs/day.

Running FFmpeg in Kubernetes looks easy. It's a binary; you put it in a container; you schedule the container as a Pod. Done.

It is easy — for the first few hundred jobs. Then come the questions: how do retries work when a Pod evicts mid-encode? How does a four-hour encode survive a node drain? How do you isolate one tenant's queue from another's without hard-coding pool names into your application? When does running FFmpeg as a Kubernetes Job give way to a worker pool, and when does the worker pool give way to a Kubernetes operator?

This post is the playbook. We've watched teams hit each transition the hard way; here's the shape of each pattern, the threshold where it stops working, and what to reach for next.

Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

The simplest pattern: every encode is a kind: Job. Templated YAML, kubectl apply, FFmpeg runs to completion, Job goes to Succeeded (or Failed).

apiVersion: batch/v1
kind: Job
metadata:
  name: encode-{{job_id}}
spec:
  backoffLimit: 2
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: ffmpeg
        image: registry.example.com/ffmpeg:6.1
        args: ["-i", "{{input}}", "-c:v", "libx264", "{{output}}"]

What it gets right:

Retry-on-failure semantics built in via Job's backoffLimit
Failures are visible in kubectl get jobs
One-shot is the right cardinality for a one-shot encode

Where it breaks:

Job creation is rate-limited at the Kubernetes API server. Tens of thousands of Job objects bog down etcd.
No queue: every encode dispatches immediately or not at all.
Image-pull churn: every Job pulls the FFmpeg image fresh unless you've configured an image pull policy carefully.
No fine-grained pool isolation: tenant A's burst saturates the cluster, tenant B's encodes starve.

Threshold: works fine up to ~500 encodes/day. Above that, Pattern 1.

Pattern 1: a worker Deployment + queue (works to ~50K/day)

Now you separate dispatch from execution. Encodes land on a queue (Redis, RabbitMQ, NATS, SQS — pick the one your team already operates). A kind: Deployment of long-lived FFmpeg worker pods pulls jobs and runs them serially.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ffmpeg-workers
spec:
  replicas: 8
  template:
    spec:
      containers:
      - name: worker
        image: registry.example.com/ffmpeg-worker:1.4
        env:
        - name: QUEUE_URL
          value: "redis://queue:6379/0"

What this fixes:

No more etcd churn; one worker pod runs many encodes serially.
The queue gives you backpressure for free.
Retries become a property of the queue (visibility timeout, max-deliveries), not of Kubernetes.

What still breaks:

Replica count is static. If your queue has 5K jobs in it, your eight replicas are going to take a while.
You can manually scale via HPA on CPU/memory, but those metrics don't reflect queue depth — you're scaling on the symptom, not the cause.
Hard tenant isolation requires running multiple Deployments by hand, with multiple queues, and routing logic in your application.

This pattern earns its keep up to about 50K encodes/day. Above that, Pattern 2.

Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

KEDA is the Kubernetes Event-Driven Autoscaler — a controller that scales Deployments based on external metrics, including queue depth. Combine the queue from Pattern 1 with a ScaledObject and your worker count tracks job pressure directly.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ffmpeg-workers
spec:
  scaleTargetRef:
    name: ffmpeg-workers
  minReplicaCount: 0
  maxReplicaCount: 200
  triggers:
  - type: redis
    metadata:
      address: redis://queue:6379
      listName: encodes
      listLength: "5"

What it fixes:

Idle queue → workers scale to zero (or a minimum). Real cost savings for spiky workloads.
Queue spike → workers scale up to the cap KEDA sees fit, in seconds.
Reactive autoscaling on the right signal: "are there jobs to run?" not "is CPU pegged?"

What still breaks above ~500K/day:

Multi-tenant isolation is still flat: one queue, one Deployment. The tenant noise problem returns at the queue level — a noisy tenant fills the queue ahead of a quiet one.
GPU and CPU pools require separate Deployments, separate queues, manual routing logic in your application layer.
The lifecycle of "draining a pool for an upgrade without dropping in-flight encodes" is something you build by hand, every time.

This is where the operator pattern earns its keep.

Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

A Kubernetes operator is a controller that manages custom resources for your domain. For video transcoding, the operator manages a custom resource — call it WorkerPool — that represents a tenant + workload-shape combination.

apiVersion: video.mpegflow.com/v1
kind: WorkerPool
metadata:
  name: tenant-acme-gpu
spec:
  tenant: acme
  workload: gpu-hevc
  queueRef: redis://queue:6379/acme-gpu
  scaling:
    min: 0
    max: 50
    metric: queueDepth
  resources:
    requests:
      nvidia.com/gpu: 1

What the operator gives you that hand-rolled patterns can't:

One CRD per tenant or per workload class. The operator reconciles each into the right Deployment + ScaledObject + Service + RBAC scope. Adding a tenant becomes a CRD apply, not a YAML-template-and-pray exercise.
Pool-level pause: cordon a WorkerPool to drain it for an upgrade without dropping in-flight encodes. Workers finish what they have, refuse new jobs, and the operator scales the pool down once empty.
Leader election: the operator runs in HA. No single point of failure for the control plane.
Per-pool routing: jobs go to the queue named in the WorkerPool spec, so your application layer doesn't need to know about pools at all. The operator keeps the topology.

This is the pattern we ship in MpegFlow's K8s deployment. Full reference: MpegFlow on Kubernetes with KEDA and the strict-broker security model that complements it.

What FFmpeg leaves to you regardless of which pattern you pick

Kubernetes solves the placement problem — where this particular FFmpeg invocation runs. It does not solve:

Stderr parsing. FFmpeg writes progress on stderr in a format Prometheus does not natively understand. You parse it.
Partial-success handling. A six-rendition ABR ladder where rendition 4 OOM'd needs that one to retry on a higher-memory pool, not the whole job. Kubernetes Pod restart restarts the whole encode.
Audit trail. Kubernetes logs the Pod stdout/stderr. It does not record encoder version, container hash, parameters, input/output hashes — the things your compliance officer asks about.
Idempotency. Kubernetes' Job retry will run FFmpeg twice with the same arguments. If your output path is the same, you'll write twice. Deterministic output naming and atomic upload is on you.

These are the problems we wrote about in Running FFmpeg at scale: queue, retry, and the audit trail. Kubernetes is necessary for FFmpeg in production; it isn't sufficient.

The decision matrix

Volume	Pattern	What you operate
<500 encodes/day	K8s Job per encode	YAML templating + cron
500–50K/day	Worker Deployment + queue	+ queue + retry logic
50K–500K/day	KEDA queue-depth autoscaling	+ ScaledObject manifests
500K+/day, multi-tenant	Video transcoder K8s operator	+ CRDs, leader election, pool routing

Closing

If you have FFmpeg running in Kubernetes today, you are somewhere on this ladder. Knowing which rung you're on is half the battle; knowing which one comes next is the other half.

The operator pattern looks like a lot of moving parts when you read about it, but the alternative is hand-rolling each of its responsibilities into your application code, where they don't belong. Make Kubernetes do the Kubernetes work; make FFmpeg do the FFmpeg work; build a thin layer between them that is its own thing — and worth its own product surface.

If you want to skip the climb, that's what MpegFlow is. The operator, the queue topology, the audit layer, the strict-broker security pattern — pre-built, with FFmpeg invocations modeled as DAG stages. We're running a design partner program for broadcast and OTT teams that want to deploy ahead of GA.

Running FFmpeg in Kubernetes looks easy. It's a binary; you put it in a container; you schedule the container as a Pod. Done.

This post is the playbook. We've watched teams hit each transition the hard way; here's the shape of each pattern, the threshold where it stops working, and what to reach for next.

Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

The simplest pattern: every encode is a kind: Job. Templated YAML, kubectl apply, FFmpeg runs to completion, Job goes to Succeeded (or Failed).

apiVersion: batch/v1
kind: Job
metadata:
  name: encode-{{job_id}}
spec:
  backoffLimit: 2
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: ffmpeg
        image: registry.example.com/ffmpeg:6.1
        args: ["-i", "{{input}}", "-c:v", "libx264", "{{output}}"]

What it gets right:

Retry-on-failure semantics built in via Job's backoffLimit
Failures are visible in kubectl get jobs
One-shot is the right cardinality for a one-shot encode

Where it breaks:

Job creation is rate-limited at the Kubernetes API server. Tens of thousands of Job objects bog down etcd.
No queue: every encode dispatches immediately or not at all.
Image-pull churn: every Job pulls the FFmpeg image fresh unless you've configured an image pull policy carefully.
No fine-grained pool isolation: tenant A's burst saturates the cluster, tenant B's encodes starve.

Threshold: works fine up to ~500 encodes/day. Above that, Pattern 1.

Pattern 1: a worker Deployment + queue (works to ~50K/day)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ffmpeg-workers
spec:
  replicas: 8
  template:
    spec:
      containers:
      - name: worker
        image: registry.example.com/ffmpeg-worker:1.4
        env:
        - name: QUEUE_URL
          value: "redis://queue:6379/0"

What this fixes:

No more etcd churn; one worker pod runs many encodes serially.
The queue gives you backpressure for free.
Retries become a property of the queue (visibility timeout, max-deliveries), not of Kubernetes.

What still breaks:

Replica count is static. If your queue has 5K jobs in it, your eight replicas are going to take a while.
You can manually scale via HPA on CPU/memory, but those metrics don't reflect queue depth — you're scaling on the symptom, not the cause.
Hard tenant isolation requires running multiple Deployments by hand, with multiple queues, and routing logic in your application.

This pattern earns its keep up to about 50K encodes/day. Above that, Pattern 2.

Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ffmpeg-workers
spec:
  scaleTargetRef:
    name: ffmpeg-workers
  minReplicaCount: 0
  maxReplicaCount: 200
  triggers:
  - type: redis
    metadata:
      address: redis://queue:6379
      listName: encodes
      listLength: "5"

What it fixes:

Idle queue → workers scale to zero (or a minimum). Real cost savings for spiky workloads.
Queue spike → workers scale up to the cap KEDA sees fit, in seconds.
Reactive autoscaling on the right signal: "are there jobs to run?" not "is CPU pegged?"

What still breaks above ~500K/day:

Multi-tenant isolation is still flat: one queue, one Deployment. The tenant noise problem returns at the queue level — a noisy tenant fills the queue ahead of a quiet one.
GPU and CPU pools require separate Deployments, separate queues, manual routing logic in your application layer.
The lifecycle of "draining a pool for an upgrade without dropping in-flight encodes" is something you build by hand, every time.

This is where the operator pattern earns its keep.

Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

apiVersion: video.mpegflow.com/v1
kind: WorkerPool
metadata:
  name: tenant-acme-gpu
spec:
  tenant: acme
  workload: gpu-hevc
  queueRef: redis://queue:6379/acme-gpu
  scaling:
    min: 0
    max: 50
    metric: queueDepth
  resources:
    requests:
      nvidia.com/gpu: 1

What the operator gives you that hand-rolled patterns can't:

One CRD per tenant or per workload class. The operator reconciles each into the right Deployment + ScaledObject + Service + RBAC scope. Adding a tenant becomes a CRD apply, not a YAML-template-and-pray exercise.
Pool-level pause: cordon a WorkerPool to drain it for an upgrade without dropping in-flight encodes. Workers finish what they have, refuse new jobs, and the operator scales the pool down once empty.
Leader election: the operator runs in HA. No single point of failure for the control plane.
Per-pool routing: jobs go to the queue named in the WorkerPool spec, so your application layer doesn't need to know about pools at all. The operator keeps the topology.

This is the pattern we ship in MpegFlow's K8s deployment. Full reference: MpegFlow on Kubernetes with KEDA and the strict-broker security model that complements it.

What FFmpeg leaves to you regardless of which pattern you pick

Kubernetes solves the placement problem — where this particular FFmpeg invocation runs. It does not solve:

Stderr parsing. FFmpeg writes progress on stderr in a format Prometheus does not natively understand. You parse it.
Partial-success handling. A six-rendition ABR ladder where rendition 4 OOM'd needs that one to retry on a higher-memory pool, not the whole job. Kubernetes Pod restart restarts the whole encode.
Audit trail. Kubernetes logs the Pod stdout/stderr. It does not record encoder version, container hash, parameters, input/output hashes — the things your compliance officer asks about.
Idempotency. Kubernetes' Job retry will run FFmpeg twice with the same arguments. If your output path is the same, you'll write twice. Deterministic output naming and atomic upload is on you.

These are the problems we wrote about in Running FFmpeg at scale: queue, retry, and the audit trail. Kubernetes is necessary for FFmpeg in production; it isn't sufficient.

The decision matrix

Volume	Pattern	What you operate
<500 encodes/day	K8s Job per encode	YAML templating + cron
500–50K/day	Worker Deployment + queue	+ queue + retry logic
50K–500K/day	KEDA queue-depth autoscaling	+ ScaledObject manifests
500K+/day, multi-tenant	Video transcoder K8s operator	+ CRDs, leader election, pool routing

Closing

If you have FFmpeg running in Kubernetes today, you are somewhere on this ladder. Knowing which rung you're on is half the battle; knowing which one comes next is the other half.

FFmpeg in Kubernetes: the pod, queue, and operator pattern

Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

Pattern 1: a worker Deployment + queue (works to ~50K/day)

Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

What FFmpeg leaves to you regardless of which pattern you pick

The decision matrix

Closing

Related reading

FFmpeg in Kubernetes: the pod, queue, and operator pattern

Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

Pattern 1: a worker Deployment + queue (works to ~50K/day)

Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

What FFmpeg leaves to you regardless of which pattern you pick

The decision matrix

Closing

Related reading

FFmpeg in Kubernetes: the pod, queue, and operator pattern

#Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

#Pattern 1: a worker Deployment + queue (works to ~50K/day)

#Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

#Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

#What FFmpeg leaves to you regardless of which pattern you pick

#The decision matrix

#Closing

Related reading

FFmpeg in Kubernetes: the pod, queue, and operator pattern

#Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

#Pattern 1: a worker Deployment + queue (works to ~50K/day)

#Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

#Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

#What FFmpeg leaves to you regardless of which pattern you pick

#The decision matrix

#Closing

Related reading

Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

Pattern 1: a worker Deployment + queue (works to ~50K/day)

Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

What FFmpeg leaves to you regardless of which pattern you pick

The decision matrix

Closing

Pattern 0: a Kubernetes Job per encode (works for hundreds, breaks at thousands)

Pattern 1: a worker Deployment + queue (works to ~50K/day)

Pattern 2: KEDA queue-depth autoscaling (works to ~500K/day)

Pattern 3: a video transcoder Kubernetes operator (multi-tenant production)

What FFmpeg leaves to you regardless of which pattern you pick

The decision matrix

Closing