The question shows up in every video team's planning doc at some point: "what's our transcoding API?" Sometimes it surfaces as "should we keep paying MediaConvert?", sometimes as "we're hand-rolling FFmpeg in a Lambda and it's getting out of control," and sometimes as "we just signed an MSA with Bitmovin, are we done?"
This post is the framework for answering it honestly. Three real options, each with different economics, different lock-in profiles, and different operational shapes. None of them is right for everyone. The teams that get this decision right pick deliberately; the teams that get it wrong drift into whichever option had the smallest activation energy at the moment they needed video shipped.
The three options, named precisely
Build it yourself. You ship FFmpeg invocations from your application code. Maybe wrapped in a Lambda, maybe a worker pool, maybe a queue + workers on EC2. Your API surface is whatever you build on top — usually a thin REST endpoint that accepts a job spec and returns a status. You own everything: queue, retries, audit, scaling, security, packaging. We covered the operational depth this requires in running FFmpeg at scale and FFmpeg in Kubernetes.
Buy a managed transcoding API. AWS MediaConvert, Bitmovin Encoding, Mux Video, Cloudflare Stream — submit a job spec via REST, the vendor returns outputs. You don't run any infrastructure. Your job specs are vendor-shaped (MediaConvert JSON, Bitmovin manifest, Mux asset). Pricing is per-minute of output.
Rent orchestration-as-a-platform. A control plane you run (or have run for you) that sits between your application and FFmpeg. You bring the codec choices and presets; the platform provides the queue, retry semantics, audit trail, multi-tenant security, and observability that every video team rebuilds from scratch. Your job specs are platform-shaped — declarative DAG manifests in our case, similar concept in adjacent products. This is the path MpegFlow is built around.
Each of those is a different abstraction level. Build = "I want bytes in, bytes out, FFmpeg under my control." Buy = "I want bytes in, bytes out, and to forget the encoder exists." Rent = "I want the orchestration layer abstracted but the encoder visible."
The economics, made concrete
Take a workload of 1M output minutes per month. Mid-tier VOD operator scale.
| Path | Per-minute cost | Monthly bill | Operational cost |
|---|---|---|---|
| Build (self-hosted on AWS spot, mixed instance fleet) | $0.0019 | $1,900 | 1 senior engineer 30% time = ~$50K/yr |
| Buy — AWS MediaConvert (on-demand pricing) | $0.0150 | $15,000 | Near zero — vendor runs everything |
| Buy — Bitmovin (committed-volume tier) | $0.0090 | $9,000 | Low — TAM does light onboarding |
| Buy — Mux Video | ~$0.040 | $40,000 | Near zero |
| Rent — orchestration platform + your storage | ~$0.005 (depends on packaging) | ~$5,000 | Some — you operate the workers |
The dollar columns are the obvious comparison. The hidden column is the operational cost — engineer time spent on encoder ops, on-call coverage, queue tuning, encoder-version pinning, stderr parsing, partial-success handling on ABR ladders. We've watched teams underestimate this by 4-10x.
The math also flips dramatically as volume changes. At 50K minutes/month the build column has terrible operational economics relative to the bill (engineer time dwarfs the savings). At 50M minutes/month the build column is basically the only viable path because per-minute pricing becomes unaffordable. The rent column threads the middle.
Lock-in math
Per-minute price is the visible cost. Lock-in is the invisible one, and it's where most regret comes from in year three.
Build lock-in. Low. Your job specs are FFmpeg invocations. Your retry semantics, queue patterns, and audit logs are your code. Migrating off "build" means rewriting your operational layer, but the encoder choices stay the same. The lock-in is in the operational layer you wrote, not in any vendor's format.
Buy lock-in. High. MediaConvert job specs are MediaConvert-shaped — you can't take them to Bitmovin without translation. Webhooks have vendor-specific signing. Your billing, your IAM, your CloudTrail logs all reference vendor-specific surfaces. The longer you stay, the deeper the integration goes. Migration is real engineering work, often a multi-month project. We covered the MediaConvert migration shape on the alternatives page.
Rent lock-in. Variable. The orchestration platform's job spec format is a real abstraction layer — you'll have to translate it if you migrate. But the underlying FFmpeg invocations are visible (in MpegFlow they're emitted in the audit log), so the encoder choices are portable. Your storage stays yours. The lock-in is roughly the same shape as build but smaller — it's in the workflow definitions, not the operational layer or the data plane.
Ergonomics: how does the API feel day-to-day
This is the dimension that decides whether engineers grow to like or hate the choice over time.
Build: maximum control, maximum ergonomics if your team is video-infrastructure-literate. You can read every FFmpeg flag, you know exactly what runs, and you can modify behavior at any layer. The downside is that every problem becomes your team's problem — including the ones nobody warned you about (CDN purge backpressure, partial-success ladders, encoder-version pinning).
Buy: minimum control, maximum convenience. The job spec API is documented; you submit JSON, you get URLs. Most teams find this satisfying for the first six months and frustrating thereafter, when "why didn't FFmpeg use the threading I expected" becomes "why doesn't the vendor expose threading at all?" — a question the API simply doesn't answer.
Rent: middle ergonomics. You write declarative DAG manifests (or whatever the platform's primitive is) instead of raw FFmpeg invocations or vendor JSON. You can see what FFmpeg ran (audit trail), you can change codec behavior (parameters flow through), and you can see why retries happened. The trade-off: a learning curve for the orchestration model, which is foreign to teams used to the buy path or the build path.
Decision matrix
Five questions. Answer them honestly and the matrix tells you which path fits.
| Question | If "yes" → | If "no" → |
|---|---|---|
| Is encoder/codec choice your moat? | Build | Buy or rent |
| Do you have a 2+ engineer video infra team? | Build viable | Buy or rent |
| Are you above 5M minutes/month? | Build economics work | Buy economics hurt; rent threads the middle |
| Does compliance forbid vendor metadata access? | Build (or rent self-hosted) | Buy works |
| Does your time-to-first-encoded-video matter in weeks? | Buy | Build or rent take longer |
Three or more "yes" → Build. Three or more "no" → Buy. Mixed answers → Rent is probably your shape, and the decision becomes which platform.
The honest middle: hybrid
Many teams end up running two of these in parallel without realizing it. Common pattern: MediaConvert for the bulk of VOD (because it shipped first and the integration is wired in), with hand-rolled FFmpeg in a worker pool for the high-volume archive migration that became uncomfortable on per-minute pricing. Or Mux for the user-uploaded content path (because Mux's player and analytics are exemplary), with self-hosted for the broadcast-grade primary content (because broadcast wants encoder visibility and audit trails Mux doesn't expose).
Hybrid is not a failure mode — it's often the right answer. The mistake is hybrid-by-accident, where each workload landed where it did because of the activation energy in that quarter rather than a deliberate choice.
If you're going to be hybrid deliberately, the orchestration-as-platform path absorbs both halves cleanly: workloads can dispatch to a managed pool we run OR a self-hosted pool you run, behind the same DAG manifest. Switching is a config change, not a re-platform.
What the API choice signals about your business
Not every team gets to make this choice freely. The choice is constrained by:
- Procurement maturity. Enterprise procurement teams default to "buy" because they have established relationships with cloud vendors and don't want one more MSA. The discount they extract on consolidated cloud spend often makes per-minute pricing more palatable than the math suggests.
- Engineering hiring posture. Teams that hire video infrastructure engineers can build. Teams that don't, can't.
- Capital posture. "Buy" looks better on the income statement (operating expense) than "build" (capital + headcount). For VC-funded startups this matters less than for established broadcasters; for public companies it matters more.
- Regulatory pressure. EU sovereign-cloud, healthcare PHI, defense — these often eliminate the buy path entirely because the vendor's metadata access is the dealbreaker.
A team's API choice tells you about their organization's tolerances as much as their technical preferences. The right framework respects that.
What we actually believe
We built MpegFlow as the rent path because we kept watching teams burn 6-12 months building the operational layer around FFmpeg from scratch — the same queue, the same retry classifier, the same audit trail, the same multi-tenant security pattern, in five different companies. That work is non-differentiating. The codec choices, the QC rules, the broadcast-spec presets — those are differentiating, and they should stay with the team that has the expertise.
If you're early enough that "build vs buy vs rent" is genuinely open for you, run the matrix above. If you're already in one of the three and questioning the choice, the honest framework above is more useful than vendor sales pitches: trust the matrix, not the marketing.
If the rent path matches your shape, the design partner program is the path. If you're earlier and trying to decide whether to build, running FFmpeg at scale is what we wish we'd had on day one. If you're at the volume where managed services hurt, the self-hosted economics writeup has the cost math that drives the conversion.
The API question is a year-defining decision. Don't make it on activation energy.