Self-hosted video transcoding without AWS: a build-vs-buy honesty

MpegFlow

When self-hosted video transcoding actually beats AWS MediaConvert. Cost math, hardware, ops trade-offs, hybrid architectures — without the vendor slide-deck framing.

There are two tribes in video infrastructure. One says self-hosting is masochism, the cloud is obviously cheaper and easier, anyone who runs their own encoders is stuck in 2014. The other says cloud transcoding is a tax on people who haven't done the math, AWS is bleeding you, you should be running this in your own DC. Both tribes are mostly wrong. The honest answer is "it depends," but it depends in specific ways that are calculable in advance.

This post is the calculation. We're going to look at when self-hosted transcoding actually wins on cost and operations, when it doesn't, and how to think about the hybrid path that most teams should be on but most aren't.

We work on a video pipeline product that runs both ways — same binary, SaaS or self-hosted — so we have a stake in this not being a holy war. The framing here is "what does the math say in your specific situation," not "you should do what we sell."

What we're comparing

For concreteness, we'll compare against AWS Elemental MediaConvert — the most common managed transcoding service that B2B teams measure against. Most of the analysis transfers directly to Bitmovin, Coconut, Mux (their VOD product), and other managed transcoders, with their per-vendor pricing details swapped in. We have honest head-to-head comparisons of MpegFlow with each of these if you want the per-vendor breakdown.

The "self-hosted" alternative we're modeling is:

FFmpeg-based encoding running on your own hardware (or your cloud account)
An orchestration layer that handles queue, retry, and audit trail (we wrote about this — it's not nothing)
Storage, CDN, and downstream delivery handled separately
Your own ops team, or a managed service for the orchestration but not the encoding

We're explicitly not comparing "use FFmpeg in a Python script" to MediaConvert. That comparison is uncharitable to MediaConvert; running FFmpeg as production infrastructure is more work than people who haven't done it think.

The actual cost of MediaConvert

AWS Elemental MediaConvert prices per minute of output, not input. The pricing tiers are roughly:

Quality preset	Output resolution	Approximate per-minute cost
Basic	up to 1080p, single-pass	$0.0075 / min
Professional	up to 1080p, two-pass / advanced	$0.015 / min
Pro 4K	up to 2160p	$0.030 / min
Pro 4K HDR / advanced	4K HDR, audio object, 8K	$0.075 / min and up

These are approximate AWS list prices for us-east-1; verify current rates in the AWS console before any decision.

Critical detail: per minute of output, summed across renditions. If you encode a 60-minute input to a 5-rendition ABR ladder of mostly Professional-tier outputs, that's 60 × 5 × $0.015 = $4.50 for the transcode of that one input. Add whatever you pay for input storage, output storage, and egress. Then add CDN. Then add captions if you used MediaConvert's caption sidecar features. The number creeps.

A few rules of thumb that hold across most accounts:

VOD library transcode runs $0.30–$2.00 per "title" depending on length and ladder
Live encoding (different SKU — MediaLive) is roughly $0.50–$2.00 per channel-hour
Anything 4K HDR roughly 4× a comparable 1080p workload
Heavy filter work (logo overlay, watermarks, complex captions) often costs the same as a tier upgrade per output

If you're processing 1,000 minutes of input per day to a 5-rendition Professional ladder, you're looking at roughly $7,500–$10,000/month on MediaConvert alone, plus storage and egress.

If those numbers feel high, this is the moment teams start thinking about self-host. Whether it's actually cheaper depends on what self-hosting really costs.

The actual cost of self-hosted

Self-hosting cost models almost always under-count. Here's a more honest list of the line items, because the difference between "MediaConvert is $10K/month" and "self-host is $4K/month" stops being meaningful when you remember that ops time costs $$$ too.

Hardware or cloud compute. Encoder workers. CPU-bound at minimum, plus optional GPU pools for throughput-heavy work. As a rough order of magnitude:

A 16-core / 32-thread CPU box (something like a c7i.4xlarge in EC2, or a bare-metal box from Hetzner/OVH) can sustain 10–25 concurrent 1080p Professional-tier encodes, depending on preset
Same box on bare-metal (no virtualization tax, dedicated NVMe) often does 1.5–2× more
GPU boxes (NVENC) do 5–20× per dollar for live and lossy throughput workloads, but are wrong for premium-VOD quality

Storage. Local fast disk for stage-in/stage-out per worker. Durable object storage for inputs and outputs. Network bandwidth between workers and durable storage. None of these is free; they're just usually smaller than the encoder line.

Orchestration layer. Either you build it (multi-quarter project — see our previous post on what's involved) or you buy/use one. Building costs eng time; buying costs $/month.

Ops time. Most under-counted. Real numbers: a self-hosted video stack at modest scale (hundreds of jobs/hour, dozens of worker boxes) consumes 20–50% of one engineer's time, ongoing. You don't see the cost on a line item, but it's there. At a $200K fully-loaded annual cost for a senior infra engineer, that's $40K–$100K/year of hidden ops cost.

Replacement features. MediaConvert ships with several things that aren't really transcoding but feel free when you use it:

DRM packaging (Widevine, FairPlay, PlayReady) — pay per output, but the SDK integration is done
Caption sidecar generation — burn-in, sidecar formats, language tags
HDR-to-SDR tone mapping presets that are actually production-tested
A web UI for ops people to visually inspect failed jobs

When you self-host, each of these becomes its own decision: do you build, do you buy a focused vendor (Vualto for DRM, etc.), or do you do without? Each one has a real cost.

The break-even math

A useful first-cut formula. Total monthly cost of a managed service is roughly:

managed_cost ≈ minutes_per_month × renditions_per_input × per-minute-rate
              + storage_cost + egress_cost

Total monthly cost of self-host is roughly:

selfhost_cost ≈ compute_per_month + storage_cost + egress_cost
              + orchestration_cost
              + (eng_time_per_month × eng_burdened_rate)
              + replacement_feature_cost

The break-even point — where self-host starts winning — depends mostly on compute_per_month and eng_time_per_month. Some rough thresholds we've seen:

Monthly volume	Typical recommendation
< 10K minutes / month	Stay managed. Self-host overhead exceeds the savings. The math doesn't justify the ops layer.
10K–100K minutes / month	Probably managed, unless you have a specific reason. Look at your single biggest cost driver — if it's compute, maybe; if it's something else (egress, DRM), self-hosting compute won't help.
100K–1M minutes / month	Hybrid wins. Run baseline workload on managed (consistent cost, no ops burden). Burst the spiky/expensive parts to self-hosted compute. Most pragmatic teams in this band end up here.
> 1M minutes / month	Self-host can win clearly, if you have the team for it. Companies like Netflix, Hulu, and BBC self-host — but they each have video infrastructure as a discipline, not a side project.

The bands are approximate. The point is the answer is rarely "always cloud" or "always self-host." It's volume-dependent and capability-dependent.

When the answer is forced — compliance and sovereignty

There's a category where the math doesn't matter: when you can't put the content in a public cloud at all. This isn't rare — it's most of broadcast, government, and enterprise media:

Broadcast / studio content under contractual restriction to specific data residency
Pre-release content where a leak from cloud-side breach is existential
EU GDPR / data residency requirements that constrain processing geography
Air-gapped environments for defense, classified, or pre-broadcast workflows
Regulated industries (some healthcare, some finance) where the workload looks like video but the framework around it is HIPAA/SOX

In these cases the question isn't "is self-hosted cheaper?" — it's "can I use the cloud at all?" If you're in that category, you're self-hosting. The interesting decision becomes which self-host stack, not whether.

The hybrid pattern most teams should run

The pattern that fits most teams in the 100K–1M minutes/month band: run baseline on managed, burst to self-hosted.

Concretely:

Steady-state, predictable transcode volume → MediaConvert (or your managed vendor)
Spiky workloads, archive backlogs, premium-quality renditions you can't afford at managed rates → self-hosted compute
A single orchestration layer that can dispatch to either, based on per-job classification

This works because the managed service handles the long tail of input variability (you don't have to maintain coverage for every codec quirk), and the self-host pool handles the high-throughput repetitive bulk where the per-minute rate is the dominant cost.

The architectural challenge is the orchestration layer. If you're building per-vendor adapters, the integration work eats the savings. If your orchestration is a single primitive that abstracts the pool, the math works.

What a good self-host stack looks like in 2026

If you've decided to self-host (or hybrid), the components have settled into a stable shape. A reasonable stack:

Compute: Bare-metal CPU boxes (Hetzner, OVH, or co-lo) for cost-sensitive workloads; cloud CPU+GPU for elastic burst
Encoder: FFmpeg, pinned to a specific compiled version per worker pool, in a container
Orchestration: A DAG-based runtime (we wrote about why DAG specifically). Generic engines (Airflow, Temporal) work but require a video-specific layer on top
Queue: Anything boring — RabbitMQ, SQS, Postgres-as-queue. Don't over-engineer
Storage: Object storage (S3, R2, or self-hosted MinIO). Local NVMe for stage-in/out
Delivery: Multi-CDN if you need failover; single CDN if you don't
DRM: Buy this. Don't build. Use Vualto, EZDRM, Widevine SDK, etc.
Captions: Sidecar formats from FFmpeg + a caption-specific tool for burn-in if needed
Observability: Same as the rest of your infra. Your encoders deserve real metrics, not a separate stack

The watch-out: you can do all of the above and end up with a stack that's harder to maintain than the managed service you replaced. The discipline is: every component you self-host needs to be one your team is committed to operating, not just installing.

FFmpeg as a service, or self-hosted parity — same binary

The reason we built MpegFlow the way we did — FFmpeg as a service with the same binary running SaaS or self-hosted — is that the build-vs-buy decision shouldn't be a wall. Most teams don't know on day one whether they'll grow into self-host scale. They want to validate cheaply on a managed offering and graduate to self-host when the math justifies it, without rewriting their pipelines.

That's the bet: a single declarative pipeline definition (the DAG) and a single runtime that can dispatch jobs to a managed pool we operate, or to a self-hosted pool you operate, or to both. Switch is a config change, not a code change. The orchestration, retry semantics, audit trail, and DRM/caption integrations are the same either side.

If that maps to where you are, the beta cohort is open. We're shipping the encoder MVP this quarter; you'll get an email when your slot can take traffic.

If you're earlier in the decision and want the operational depth on self-hosted FFmpeg specifically, the queue/retry/audit post covers the parts of self-hosting most teams under-build. If you're earlier still and trying to decide whether any of this should be a graph at all, the DAG thesis is where to start. If you've already decided to self-host and you want the concrete cost math on running encoders on AWS Spot / GCP Preemptible / Azure Spot, the cost-aware spot-instance encoder pool architecture is the next read — that's where the per-minute economics actually land. And for the underlying decision framework, build vs buy in 2026 treats "which layers do you build" as the actual question instead of a binary.

The honest one-line answer to "should I self-host?": probably not yet, and definitely not without doing the math. The marketing on either side will not do it for you.

What we're comparing

The "self-hosted" alternative we're modeling is:

FFmpeg-based encoding running on your own hardware (or your cloud account)
An orchestration layer that handles queue, retry, and audit trail (we wrote about this — it's not nothing)
Storage, CDN, and downstream delivery handled separately
Your own ops team, or a managed service for the orchestration but not the encoding

The actual cost of MediaConvert

AWS Elemental MediaConvert prices per minute of output, not input. The pricing tiers are roughly:

Quality preset	Output resolution	Approximate per-minute cost
Basic	up to 1080p, single-pass	$0.0075 / min
Professional	up to 1080p, two-pass / advanced	$0.015 / min
Pro 4K	up to 2160p	$0.030 / min
Pro 4K HDR / advanced	4K HDR, audio object, 8K	$0.075 / min and up

These are approximate AWS list prices for us-east-1; verify current rates in the AWS console before any decision.

A few rules of thumb that hold across most accounts:

VOD library transcode runs $0.30–$2.00 per "title" depending on length and ladder
Live encoding (different SKU — MediaLive) is roughly $0.50–$2.00 per channel-hour
Anything 4K HDR roughly 4× a comparable 1080p workload
Heavy filter work (logo overlay, watermarks, complex captions) often costs the same as a tier upgrade per output

If you're processing 1,000 minutes of input per day to a 5-rendition Professional ladder, you're looking at roughly $7,500–$10,000/month on MediaConvert alone, plus storage and egress.

If those numbers feel high, this is the moment teams start thinking about self-host. Whether it's actually cheaper depends on what self-hosting really costs.

The actual cost of self-hosted

Hardware or cloud compute. Encoder workers. CPU-bound at minimum, plus optional GPU pools for throughput-heavy work. As a rough order of magnitude:

A 16-core / 32-thread CPU box (something like a c7i.4xlarge in EC2, or a bare-metal box from Hetzner/OVH) can sustain 10–25 concurrent 1080p Professional-tier encodes, depending on preset
Same box on bare-metal (no virtualization tax, dedicated NVMe) often does 1.5–2× more
GPU boxes (NVENC) do 5–20× per dollar for live and lossy throughput workloads, but are wrong for premium-VOD quality

Orchestration layer. Either you build it (multi-quarter project — see our previous post on what's involved) or you buy/use one. Building costs eng time; buying costs $/month.

Replacement features. MediaConvert ships with several things that aren't really transcoding but feel free when you use it:

DRM packaging (Widevine, FairPlay, PlayReady) — pay per output, but the SDK integration is done
Caption sidecar generation — burn-in, sidecar formats, language tags
HDR-to-SDR tone mapping presets that are actually production-tested
A web UI for ops people to visually inspect failed jobs

When you self-host, each of these becomes its own decision: do you build, do you buy a focused vendor (Vualto for DRM, etc.), or do you do without? Each one has a real cost.

The break-even math

A useful first-cut formula. Total monthly cost of a managed service is roughly:

managed_cost ≈ minutes_per_month × renditions_per_input × per-minute-rate
              + storage_cost + egress_cost

Total monthly cost of self-host is roughly:

selfhost_cost ≈ compute_per_month + storage_cost + egress_cost
              + orchestration_cost
              + (eng_time_per_month × eng_burdened_rate)
              + replacement_feature_cost

The break-even point — where self-host starts winning — depends mostly on compute_per_month and eng_time_per_month. Some rough thresholds we've seen:

Monthly volume	Typical recommendation
< 10K minutes / month	Stay managed. Self-host overhead exceeds the savings. The math doesn't justify the ops layer.
10K–100K minutes / month	Probably managed, unless you have a specific reason. Look at your single biggest cost driver — if it's compute, maybe; if it's something else (egress, DRM), self-hosting compute won't help.
100K–1M minutes / month	Hybrid wins. Run baseline workload on managed (consistent cost, no ops burden). Burst the spiky/expensive parts to self-hosted compute. Most pragmatic teams in this band end up here.
> 1M minutes / month	Self-host can win clearly, if you have the team for it. Companies like Netflix, Hulu, and BBC self-host — but they each have video infrastructure as a discipline, not a side project.

The bands are approximate. The point is the answer is rarely "always cloud" or "always self-host." It's volume-dependent and capability-dependent.

When the answer is forced — compliance and sovereignty

There's a category where the math doesn't matter: when you can't put the content in a public cloud at all. This isn't rare — it's most of broadcast, government, and enterprise media:

Broadcast / studio content under contractual restriction to specific data residency
Pre-release content where a leak from cloud-side breach is existential
EU GDPR / data residency requirements that constrain processing geography
Air-gapped environments for defense, classified, or pre-broadcast workflows
Regulated industries (some healthcare, some finance) where the workload looks like video but the framework around it is HIPAA/SOX

The hybrid pattern most teams should run

The pattern that fits most teams in the 100K–1M minutes/month band: run baseline on managed, burst to self-hosted.

Concretely:

Steady-state, predictable transcode volume → MediaConvert (or your managed vendor)
Spiky workloads, archive backlogs, premium-quality renditions you can't afford at managed rates → self-hosted compute
A single orchestration layer that can dispatch to either, based on per-job classification

What a good self-host stack looks like in 2026

If you've decided to self-host (or hybrid), the components have settled into a stable shape. A reasonable stack:

Compute: Bare-metal CPU boxes (Hetzner, OVH, or co-lo) for cost-sensitive workloads; cloud CPU+GPU for elastic burst
Encoder: FFmpeg, pinned to a specific compiled version per worker pool, in a container
Orchestration: A DAG-based runtime (we wrote about why DAG specifically). Generic engines (Airflow, Temporal) work but require a video-specific layer on top
Queue: Anything boring — RabbitMQ, SQS, Postgres-as-queue. Don't over-engineer
Storage: Object storage (S3, R2, or self-hosted MinIO). Local NVMe for stage-in/out
Delivery: Multi-CDN if you need failover; single CDN if you don't
DRM: Buy this. Don't build. Use Vualto, EZDRM, Widevine SDK, etc.
Captions: Sidecar formats from FFmpeg + a caption-specific tool for burn-in if needed
Observability: Same as the rest of your infra. Your encoders deserve real metrics, not a separate stack

FFmpeg as a service, or self-hosted parity — same binary

If that maps to where you are, the beta cohort is open. We're shipping the encoder MVP this quarter; you'll get an email when your slot can take traffic.

The honest one-line answer to "should I self-host?": probably not yet, and definitely not without doing the math. The marketing on either side will not do it for you.

Self-hosted video transcoding without AWS: a build-vs-buy honesty

What we're comparing

The actual cost of MediaConvert

The actual cost of self-hosted

The break-even math

When the answer is forced — compliance and sovereignty

The hybrid pattern most teams should run

What a good self-host stack looks like in 2026

FFmpeg as a service, or self-hosted parity — same binary

Related reading

Self-hosted video transcoding without AWS: a build-vs-buy honesty

What we're comparing

The actual cost of MediaConvert

The actual cost of self-hosted

The break-even math

When the answer is forced — compliance and sovereignty

The hybrid pattern most teams should run

What a good self-host stack looks like in 2026

FFmpeg as a service, or self-hosted parity — same binary

Related reading

Self-hosted video transcoding without AWS: a build-vs-buy honesty

#What we're comparing

#The actual cost of MediaConvert

#The actual cost of self-hosted

#The break-even math

#When the answer is forced — compliance and sovereignty

#The hybrid pattern most teams should run

#What a good self-host stack looks like in 2026

#FFmpeg as a service, or self-hosted parity — same binary

Related reading

Self-hosted video transcoding without AWS: a build-vs-buy honesty

#What we're comparing

#The actual cost of MediaConvert

#The actual cost of self-hosted

#The break-even math

#When the answer is forced — compliance and sovereignty

#The hybrid pattern most teams should run

#What a good self-host stack looks like in 2026

#FFmpeg as a service, or self-hosted parity — same binary

Related reading

What we're comparing

The actual cost of MediaConvert

The actual cost of self-hosted

The break-even math

When the answer is forced — compliance and sovereignty

The hybrid pattern most teams should run

What a good self-host stack looks like in 2026

FFmpeg as a service, or self-hosted parity — same binary

What we're comparing

The actual cost of MediaConvert

The actual cost of self-hosted

The break-even math

When the answer is forced — compliance and sovereignty

The hybrid pattern most teams should run

What a good self-host stack looks like in 2026

FFmpeg as a service, or self-hosted parity — same binary