There are two tribes in video infrastructure. One says self-hosting is masochism, the cloud is obviously cheaper and easier, anyone who runs their own encoders is stuck in 2014. The other says cloud transcoding is a tax on people who haven't done the math, AWS is bleeding you, you should be running this in your own DC. Both tribes are mostly wrong. The honest answer is "it depends," but it depends in specific ways that are calculable in advance.
This post is the calculation. We're going to look at when self-hosted transcoding actually wins on cost and operations, when it doesn't, and how to think about the hybrid path that most teams should be on but most aren't.
We work on a video pipeline product that runs both ways — same binary, SaaS or self-hosted — so we have a stake in this not being a holy war. The framing here is "what does the math say in your specific situation," not "you should do what we sell."
What we're comparing
For concreteness, we'll compare against AWS Elemental MediaConvert — the most common managed transcoding service that B2B teams measure against. Most of the analysis transfers directly to Bitmovin, Coconut, Mux (their VOD product), and other managed transcoders, with their per-vendor pricing details swapped in. We have honest head-to-head comparisons of MpegFlow with each of these if you want the per-vendor breakdown.
The "self-hosted" alternative we're modeling is:
- FFmpeg-based encoding running on your own hardware (or your cloud account)
- An orchestration layer that handles queue, retry, and audit trail (we wrote about this — it's not nothing)
- Storage, CDN, and downstream delivery handled separately
- Your own ops team, or a managed service for the orchestration but not the encoding
We're explicitly not comparing "use FFmpeg in a Python script" to MediaConvert. That comparison is uncharitable to MediaConvert; running FFmpeg as production infrastructure is more work than people who haven't done it think.
The actual cost of MediaConvert
AWS Elemental MediaConvert prices per minute of output, not input. The pricing tiers are roughly:
| Quality preset | Output resolution | Approximate per-minute cost |
|---|---|---|
| Basic | up to 1080p, single-pass | $0.0075 / min |
| Professional | up to 1080p, two-pass / advanced | $0.015 / min |
| Pro 4K | up to 2160p | $0.030 / min |
| Pro 4K HDR / advanced | 4K HDR, audio object, 8K | $0.075 / min and up |
These are approximate AWS list prices for us-east-1; verify current rates in the AWS console before any decision.
Critical detail: per minute of output, summed across renditions. If you encode a 60-minute input to a 5-rendition ABR ladder of mostly Professional-tier outputs, that's 60 × 5 × $0.015 = $4.50 for the transcode of that one input. Add whatever you pay for input storage, output storage, and egress. Then add CDN. Then add captions if you used MediaConvert's caption sidecar features. The number creeps.
A few rules of thumb that hold across most accounts:
- VOD library transcode runs $0.30–$2.00 per "title" depending on length and ladder
- Live encoding (different SKU — MediaLive) is roughly $0.50–$2.00 per channel-hour
- Anything 4K HDR roughly 4× a comparable 1080p workload
- Heavy filter work (logo overlay, watermarks, complex captions) often costs the same as a tier upgrade per output
If you're processing 1,000 minutes of input per day to a 5-rendition Professional ladder, you're looking at roughly $7,500–$10,000/month on MediaConvert alone, plus storage and egress.
If those numbers feel high, this is the moment teams start thinking about self-host. Whether it's actually cheaper depends on what self-hosting really costs.
The actual cost of self-hosted
Self-hosting cost models almost always under-count. Here's a more honest list of the line items, because the difference between "MediaConvert is $10K/month" and "self-host is $4K/month" stops being meaningful when you remember that ops time costs $$$ too.
Hardware or cloud compute. Encoder workers. CPU-bound at minimum, plus optional GPU pools for throughput-heavy work. As a rough order of magnitude:
- A 16-core / 32-thread CPU box (something like a c7i.4xlarge in EC2, or a bare-metal box from Hetzner/OVH) can sustain 10–25 concurrent 1080p Professional-tier encodes, depending on preset
- Same box on bare-metal (no virtualization tax, dedicated NVMe) often does 1.5–2× more
- GPU boxes (NVENC) do 5–20× per dollar for live and lossy throughput workloads, but are wrong for premium-VOD quality
Storage. Local fast disk for stage-in/stage-out per worker. Durable object storage for inputs and outputs. Network bandwidth between workers and durable storage. None of these is free; they're just usually smaller than the encoder line.
Orchestration layer. Either you build it (multi-quarter project — see our previous post on what's involved) or you buy/use one. Building costs eng time; buying costs $/month.
Ops time. Most under-counted. Real numbers: a self-hosted video stack at modest scale (hundreds of jobs/hour, dozens of worker boxes) consumes 20–50% of one engineer's time, ongoing. You don't see the cost on a line item, but it's there. At a $200K fully-loaded annual cost for a senior infra engineer, that's $40K–$100K/year of hidden ops cost.
Replacement features. MediaConvert ships with several things that aren't really transcoding but feel free when you use it:
- DRM packaging (Widevine, FairPlay, PlayReady) — pay per output, but the SDK integration is done
- Caption sidecar generation — burn-in, sidecar formats, language tags
- HDR-to-SDR tone mapping presets that are actually production-tested
- A web UI for ops people to visually inspect failed jobs
When you self-host, each of these becomes its own decision: do you build, do you buy a focused vendor (Vualto for DRM, etc.), or do you do without? Each one has a real cost.
The break-even math
A useful first-cut formula. Total monthly cost of a managed service is roughly:
managed_cost ≈ minutes_per_month × renditions_per_input × per-minute-rate
+ storage_cost + egress_cost
Total monthly cost of self-host is roughly:
selfhost_cost ≈ compute_per_month + storage_cost + egress_cost
+ orchestration_cost
+ (eng_time_per_month × eng_burdened_rate)
+ replacement_feature_cost
The break-even point — where self-host starts winning — depends mostly on compute_per_month and eng_time_per_month. Some rough thresholds we've seen:
| Monthly volume | Typical recommendation |
|---|---|
| < 10K minutes / month | Stay managed. Self-host overhead exceeds the savings. The math doesn't justify the ops layer. |
| 10K–100K minutes / month | Probably managed, unless you have a specific reason. Look at your single biggest cost driver — if it's compute, maybe; if it's something else (egress, DRM), self-hosting compute won't help. |
| 100K–1M minutes / month | Hybrid wins. Run baseline workload on managed (consistent cost, no ops burden). Burst the spiky/expensive parts to self-hosted compute. Most pragmatic teams in this band end up here. |
| > 1M minutes / month | Self-host can win clearly, if you have the team for it. Companies like Netflix, Hulu, and BBC self-host — but they each have video infrastructure as a discipline, not a side project. |
The bands are approximate. The point is the answer is rarely "always cloud" or "always self-host." It's volume-dependent and capability-dependent.
When the answer is forced — compliance and sovereignty
There's a category where the math doesn't matter: when you can't put the content in a public cloud at all. This isn't rare — it's most of broadcast, government, and enterprise media:
- Broadcast / studio content under contractual restriction to specific data residency
- Pre-release content where a leak from cloud-side breach is existential
- EU GDPR / data residency requirements that constrain processing geography
- Air-gapped environments for defense, classified, or pre-broadcast workflows
- Regulated industries (some healthcare, some finance) where the workload looks like video but the framework around it is HIPAA/SOX
In these cases the question isn't "is self-hosted cheaper?" — it's "can I use the cloud at all?" If you're in that category, you're self-hosting. The interesting decision becomes which self-host stack, not whether.
The hybrid pattern most teams should run
The pattern that fits most teams in the 100K–1M minutes/month band: run baseline on managed, burst to self-hosted.
Concretely:
- Steady-state, predictable transcode volume → MediaConvert (or your managed vendor)
- Spiky workloads, archive backlogs, premium-quality renditions you can't afford at managed rates → self-hosted compute
- A single orchestration layer that can dispatch to either, based on per-job classification
This works because the managed service handles the long tail of input variability (you don't have to maintain coverage for every codec quirk), and the self-host pool handles the high-throughput repetitive bulk where the per-minute rate is the dominant cost.
The architectural challenge is the orchestration layer. If you're building per-vendor adapters, the integration work eats the savings. If your orchestration is a single primitive that abstracts the pool, the math works.
What a good self-host stack looks like in 2026
If you've decided to self-host (or hybrid), the components have settled into a stable shape. A reasonable stack:
- Compute: Bare-metal CPU boxes (Hetzner, OVH, or co-lo) for cost-sensitive workloads; cloud CPU+GPU for elastic burst
- Encoder: FFmpeg, pinned to a specific compiled version per worker pool, in a container
- Orchestration: A DAG-based runtime (we wrote about why DAG specifically). Generic engines (Airflow, Temporal) work but require a video-specific layer on top
- Queue: Anything boring — RabbitMQ, SQS, Postgres-as-queue. Don't over-engineer
- Storage: Object storage (S3, R2, or self-hosted MinIO). Local NVMe for stage-in/out
- Delivery: Multi-CDN if you need failover; single CDN if you don't
- DRM: Buy this. Don't build. Use Vualto, EZDRM, Widevine SDK, etc.
- Captions: Sidecar formats from FFmpeg + a caption-specific tool for burn-in if needed
- Observability: Same as the rest of your infra. Your encoders deserve real metrics, not a separate stack
The watch-out: you can do all of the above and end up with a stack that's harder to maintain than the managed service you replaced. The discipline is: every component you self-host needs to be one your team is committed to operating, not just installing.
FFmpeg as a service, or self-hosted parity — same binary
The reason we built MpegFlow the way we did — FFmpeg as a service with the same binary running SaaS or self-hosted — is that the build-vs-buy decision shouldn't be a wall. Most teams don't know on day one whether they'll grow into self-host scale. They want to validate cheaply on a managed offering and graduate to self-host when the math justifies it, without rewriting their pipelines.
That's the bet: a single declarative pipeline definition (the DAG) and a single runtime that can dispatch jobs to a managed pool we operate, or to a self-hosted pool you operate, or to both. Switch is a config change, not a code change. The orchestration, retry semantics, audit trail, and DRM/caption integrations are the same either side.
If that maps to where you are, the beta cohort is open. We're shipping the encoder MVP this quarter; you'll get an email when your slot can take traffic.
If you're earlier in the decision and want the operational depth on self-hosted FFmpeg specifically, the queue/retry/audit post covers the parts of self-hosting most teams under-build. If you're earlier still and trying to decide whether any of this should be a graph at all, the DAG thesis is where to start. If you've already decided to self-host and you want the concrete cost math on running encoders on AWS Spot / GCP Preemptible / Azure Spot, the cost-aware spot-instance encoder pool architecture is the next read — that's where the per-minute economics actually land. And for the underlying decision framework, build vs buy in 2026 treats "which layers do you build" as the actual question instead of a binary.
The honest one-line answer to "should I self-host?": probably not yet, and definitely not without doing the math. The marketing on either side will not do it for you.