ABR ladder VMAF calibration — finding the right bitrate per rung for your content

MpegFlow

Practical guide to ABR ladder calibration via VMAF — representative content selection, per-rung VMAF target setting, bitrate sweep procedure, identifying ladder gaps, iteration.

ABR ladder calibration via VMAF is the practical procedure for picking bitrates per ladder rung that hit specific quality targets on your specific content. Generic bitrate recommendations are starting points; calibration gives you the actual bitrates that produce, say, VMAF 92 at 1080p HEVC for YOUR content. This page is the engineering reference for the calibration procedure.

Why calibrate

Generic ladder recommendations (1080p HEVC at 4-6 Mbps; 720p HEVC at 2-3 Mbps) are starting points. Your specific content may need different bitrates to hit equivalent quality. Reasons:

Content complexity differences — sports vs animation vs talking heads have different optimal bitrates.
Encoder version differences — newer x265 versions are slightly more efficient than older ones.
Encoder preset differences — slow preset vs medium preset have different quality at same bitrate.
Color space and bit depth — HDR content vs SDR; 10-bit vs 8-bit differ in quality-bitrate relationship.

Calibration tells you the actual bitrates for your specific configuration on your specific content. Generic recommendations underestimate or overestimate by 10-30%.

The calibration procedure

The procedure:

Step 1: Pick representative content samples.

Choose 5-10 clips covering your production content variety:

A talking head segment (low complexity).
A sports clip (high motion).
A nature documentary (mid complexity, color-rich).
An animation clip (sharp edges, flat regions).
A drama scene (mid complexity, lighting variation).

Each clip 30-90 seconds. Together they span your production content space.

Step 2: Define per-rung quality targets.

For each ladder rung, specify the VMAF target:

4K top tier: VMAF 95+ (premium quality).
1080p top: VMAF 93.
1080p mid: VMAF 88.
720p: VMAF 85.
540p: VMAF 80.
360p (floor): VMAF 75.

These thresholds are decisions, not facts. Higher targets = more bandwidth; lower targets = lower quality. Pick based on your audience expectations.

Step 3: Encode each clip at multiple bitrates per rung.

For each (clip, rung) combination, encode at 5-7 bitrates spanning a range. For 1080p HEVC, bitrates might be 2, 3, 4, 5, 6, 8, 10 Mbps.

This is significant compute — 10 clips × 6 rungs × 7 bitrates = 420 encodes per calibration cycle.

Step 4: Compute VMAF per encode.

Run VMAF measurement on each encoded version against the original. Record (clip, rung, bitrate, VMAF) tuples.

Step 5: Find target bitrate per rung per clip.

For each (clip, rung), find the lowest bitrate where VMAF >= the target threshold. This is the calibrated bitrate for that combination.

For each rung, take the maximum across clips (the worst-case bitrate that hits target). This is your production bitrate for that rung — it ensures all content types hit the quality target.

Step 6: Iterate.

After running production for a while, gather user feedback (subjective complaints, A/B test results). Adjust thresholds or rung bitrates if needed.

Example calibration result

For a streaming service running this calibration:

Content type	1080p VMAF 92 target	Calibrated bitrate
Talking head	x265 medium	2.5 Mbps
Drama	x265 medium	3.8 Mbps
Animation	x265 medium	3.2 Mbps
Sports	x265 medium	5.2 Mbps
Nature documentary	x265 medium	4.5 Mbps

The maximum is 5.2 Mbps (sports). Production 1080p HEVC bitrate: 5.2 Mbps.

If most content is talking head (which only needs 2.5 Mbps), this leaves bandwidth on the table for the easy content. Two options:

Universal ladder: ship 5.2 Mbps for all content. Simpler operations; some bandwidth waste on easy content.
Per-content tuning (per-title encoding): customize bitrate per content. More complex; less waste.

For production at scale, per-title encoding is increasingly the answer. See per-title encoding for that workflow.

Per-content vs universal ladder

The choice:

Universal ladder:

One set of bitrates for all content.
Operationally simple.
Bitrate calibrated to worst-case content (typically high-motion or grainy).
Wastes bandwidth on easier content.

Per-content (per-title):

Bitrate tuned per asset based on actual content.
More complex pipeline (analysis encodes, content-aware encoding).
Bandwidth-optimal per asset.
Requires more encoding compute upfront.

Calibration-based universal ladder is the right choice for:

Pipelines without per-title infrastructure.
Content that's relatively homogeneous.
Volume-tier streaming where operational simplicity matters.

Per-title is the right choice for:

Premium streaming where bandwidth costs are high.
Content with significant complexity variation.
Pipelines with mature encoding infrastructure.

For most pipelines starting out, calibrated universal ladder is sufficient. Migrate to per-title as scale and sophistication grow.

Identifying ladder gaps

Calibration may reveal gaps in ladder design:

Gap 1: Tier overlap

Two adjacent rungs produce nearly identical VMAF on average content:

1080p HEVC at 4 Mbps: VMAF 90.
720p HEVC at 2.5 Mbps: VMAF 89.

The 1080p tier doesn't add meaningful quality vs 720p; might as well drop one. Adjust ladder.

Gap 2: Quality cliff

Two adjacent rungs have a large quality gap:

1080p HEVC at 4 Mbps: VMAF 91.
720p HEVC at 2 Mbps: VMAF 78.

The drop between 1080p and 720p is too steep; adapt clients see quality crash when bandwidth drops. Add an intermediate rung (1080p at 3 Mbps or 900p at 2.5 Mbps).

Gap 3: Floor too low

The lowest rung produces VMAF too low to be acceptable:

360p H.264 at 600 kbps: VMAF 68.

VMAF 68 is "noticeable quality issues." Is that acceptable for your audience? Adjust by raising the floor bitrate or quality target.

Gap 4: Top too high

The highest rung is over-engineered:

4K HEVC at 15 Mbps: VMAF 96.5.

If VMAF 95 is sufficient for premium quality, the extra Mbps is waste. Reduce top-tier bitrate.

Calibration tooling

For pipelines, calibration is typically scripted:

def calibrate_ladder(content_corpus, encoder_config, rung_targets):
    """Run calibration sweep and produce calibrated bitrates."""
    results = {}
    
    for clip in content_corpus:
        for rung_name, (resolution, codec, vmaf_target) in rung_targets.items():
            bitrate_sweep = generate_bitrate_sweep(rung_name)
            
            for bitrate in bitrate_sweep:
                encoded = encode(clip, encoder_config, resolution, codec, bitrate)
                vmaf = compute_vmaf(clip, encoded)
                
                if vmaf >= vmaf_target:
                    # Found target bitrate for this clip + rung
                    results.setdefault(rung_name, []).append(bitrate)
                    break
    
    # Per rung: take max across clips (worst-case)
    calibrated = {rung: max(bitrates) for rung, bitrates in results.items()}
    return calibrated

Run on a content corpus; produces calibrated bitrate per rung.

Re-calibration triggers

Re-calibrate when:

Encoder version changes — new encoder version may have different efficiency.
Encoder preset changes — switching from medium to slow changes the curve.
Codec changes — adding AV1 to a previously HEVC-only ladder.
Quality target changes — raising or lowering target VMAF per rung.
Audience feedback — quality complaints suggest under-calibration.
Periodic refresh — every 6-12 months for hygiene.

Each calibration is significant compute; don't run continuously. Trigger explicitly when there's a reason.

Operational considerations

Things that matter for calibration in production:

Content corpus refresh — keep the calibration corpus current with production content. Old corpus may not represent new content patterns.
Statistical significance — small corpora may not span quality variation; aim for 10+ clips minimum.
Calibration documentation — record the calibration date, encoder version, content corpus, results. Future you will appreciate it.
Pipeline updates — when calibration completes, update production ladder configurations. Verify deployment works as intended.
A/B testing post-calibration — verify that calibrated ladder behaves as expected in production.

Multi-codec ladder calibration

When the ladder includes multiple codecs (AV1 top + HEVC mid + H.264 floor), calibration becomes more complex:

Per-codec calibration:

Each codec has its own efficiency curve. AV1 at 1080p hits VMAF 92 at ~3 Mbps; HEVC at 1080p needs ~4.5 Mbps; H.264 at 1080p needs ~7 Mbps for the same target. Calibrate each codec independently per rung.

Cross-codec quality consistency:

For rungs that exist in multiple codec variants (e.g., 1080p AV1 and 1080p HEVC), the VMAF should be similar — players adapting between AV1 and HEVC at 1080p shouldn't see quality changes. Calibrate to consistent VMAF across codec variants at the same resolution.

Audience-weighted calibration:

Different rungs serve different audience subsets. Top AV1 tier serves AV1-capable; mid HEVC tier serves HEVC-capable; floor H.264 tier serves everyone. Audience size per tier affects how much investment in calibration each tier deserves.

For premium streaming with significant audience in each codec tier, calibrate all of them to consistent quality. For pipelines where the floor is rarely served, less calibration effort on the floor is fine.

Calibration cadence

How often to re-calibrate:

Per-encoder-update: when SVT-AV1 or x265 ships a new version, re-calibrate. The improvement may shift the curves.
Per-content-strategy-shift: when the production content type changes (new genres, different languages, different production styles), re-calibrate.
Annually: hygiene re-calibration. Even without specific changes, an annual sanity check catches drift.
Per-customer-feedback-cycle: if quality complaints come from a specific tier, re-calibrate that tier with the complaint pattern in mind.

Don't re-calibrate continuously — calibration is expensive (significant compute time on the bitrate sweep). Trigger explicitly.

What MpegFlow does with ABR ladder calibration

MpegFlow runs the calibration encoding portion as a multi-stage workflow: the partitioner splits the bitrate sweep into parallel encode stages (one per candidate bitrate per rung), each stage runs on an FfmpegExecutor worker, and a downstream measurement stage runs the libvmaf filter via the quality-analysis node to compute VMAF for each candidate against source. Cross-stage data flow wires the encode outputs into the measurement input. Results land in the workflow's metadata storage with per-(rung, bitrate) VMAF scores.

The selection step — applying VMAF target thresholds to pick the recommended bitrate per rung — runs today as analysis output that an operator reviews. Automatic per-rung enforcement (a decision node that re-encodes when VMAF misses target without human review) is on the roadmap; today the loop closes with operator review and a YAML update.

For production deployment, calibrated bitrates flow into workflow YAML as the per-rung configuration. The encoder pool runs production encoding with the calibrated values via the same DAG runtime that handles non-calibrated workflows — same executors, same retry semantics, same audit trail.

For customers without time for full calibration, MpegFlow provides default ladder templates with sensible bitrates per content type. These are starting points; customers calibrate against their specific content for production.

The strict-broker security model handles calibration workflows the same as standard encoding — workers receive content via short-lived presigned URLs, run analysis encodes, compute VMAF, emit results.

The general guidance: calibration is a one-time investment that pays off across years of pipeline operation. Generic bitrates work; calibrated bitrates work better. For any pipeline shipping at meaningful streaming volume, the calibration cost is dwarfed by the bandwidth savings or quality improvements it enables.

Why calibrate

Generic ladder recommendations (1080p HEVC at 4-6 Mbps; 720p HEVC at 2-3 Mbps) are starting points. Your specific content may need different bitrates to hit equivalent quality. Reasons:

Content complexity differences — sports vs animation vs talking heads have different optimal bitrates.
Encoder version differences — newer x265 versions are slightly more efficient than older ones.
Encoder preset differences — slow preset vs medium preset have different quality at same bitrate.
Color space and bit depth — HDR content vs SDR; 10-bit vs 8-bit differ in quality-bitrate relationship.

Calibration tells you the actual bitrates for your specific configuration on your specific content. Generic recommendations underestimate or overestimate by 10-30%.

The calibration procedure

The procedure:

Step 1: Pick representative content samples.

Choose 5-10 clips covering your production content variety:

A talking head segment (low complexity).
A sports clip (high motion).
A nature documentary (mid complexity, color-rich).
An animation clip (sharp edges, flat regions).
A drama scene (mid complexity, lighting variation).

Each clip 30-90 seconds. Together they span your production content space.

Step 2: Define per-rung quality targets.

For each ladder rung, specify the VMAF target:

4K top tier: VMAF 95+ (premium quality).
1080p top: VMAF 93.
1080p mid: VMAF 88.
720p: VMAF 85.
540p: VMAF 80.
360p (floor): VMAF 75.

These thresholds are decisions, not facts. Higher targets = more bandwidth; lower targets = lower quality. Pick based on your audience expectations.

Step 3: Encode each clip at multiple bitrates per rung.

For each (clip, rung) combination, encode at 5-7 bitrates spanning a range. For 1080p HEVC, bitrates might be 2, 3, 4, 5, 6, 8, 10 Mbps.

This is significant compute — 10 clips × 6 rungs × 7 bitrates = 420 encodes per calibration cycle.

Step 4: Compute VMAF per encode.

Run VMAF measurement on each encoded version against the original. Record (clip, rung, bitrate, VMAF) tuples.

Step 5: Find target bitrate per rung per clip.

For each (clip, rung), find the lowest bitrate where VMAF >= the target threshold. This is the calibrated bitrate for that combination.

For each rung, take the maximum across clips (the worst-case bitrate that hits target). This is your production bitrate for that rung — it ensures all content types hit the quality target.

Step 6: Iterate.

After running production for a while, gather user feedback (subjective complaints, A/B test results). Adjust thresholds or rung bitrates if needed.

Example calibration result

For a streaming service running this calibration:

Content type	1080p VMAF 92 target	Calibrated bitrate
Talking head	x265 medium	2.5 Mbps
Drama	x265 medium	3.8 Mbps
Animation	x265 medium	3.2 Mbps
Sports	x265 medium	5.2 Mbps
Nature documentary	x265 medium	4.5 Mbps

The maximum is 5.2 Mbps (sports). Production 1080p HEVC bitrate: 5.2 Mbps.

If most content is talking head (which only needs 2.5 Mbps), this leaves bandwidth on the table for the easy content. Two options:

Universal ladder: ship 5.2 Mbps for all content. Simpler operations; some bandwidth waste on easy content.
Per-content tuning (per-title encoding): customize bitrate per content. More complex; less waste.

For production at scale, per-title encoding is increasingly the answer. See per-title encoding for that workflow.

Per-content vs universal ladder

The choice:

Universal ladder:

One set of bitrates for all content.
Operationally simple.
Bitrate calibrated to worst-case content (typically high-motion or grainy).
Wastes bandwidth on easier content.

Per-content (per-title):

Bitrate tuned per asset based on actual content.
More complex pipeline (analysis encodes, content-aware encoding).
Bandwidth-optimal per asset.
Requires more encoding compute upfront.

Calibration-based universal ladder is the right choice for:

Pipelines without per-title infrastructure.
Content that's relatively homogeneous.
Volume-tier streaming where operational simplicity matters.

Per-title is the right choice for:

Premium streaming where bandwidth costs are high.
Content with significant complexity variation.
Pipelines with mature encoding infrastructure.

For most pipelines starting out, calibrated universal ladder is sufficient. Migrate to per-title as scale and sophistication grow.

Identifying ladder gaps

Calibration may reveal gaps in ladder design:

Gap 1: Tier overlap

Two adjacent rungs produce nearly identical VMAF on average content:

1080p HEVC at 4 Mbps: VMAF 90.
720p HEVC at 2.5 Mbps: VMAF 89.

The 1080p tier doesn't add meaningful quality vs 720p; might as well drop one. Adjust ladder.

Gap 2: Quality cliff

Two adjacent rungs have a large quality gap:

1080p HEVC at 4 Mbps: VMAF 91.
720p HEVC at 2 Mbps: VMAF 78.

The drop between 1080p and 720p is too steep; adapt clients see quality crash when bandwidth drops. Add an intermediate rung (1080p at 3 Mbps or 900p at 2.5 Mbps).

Gap 3: Floor too low

The lowest rung produces VMAF too low to be acceptable:

360p H.264 at 600 kbps: VMAF 68.

VMAF 68 is "noticeable quality issues." Is that acceptable for your audience? Adjust by raising the floor bitrate or quality target.

Gap 4: Top too high

The highest rung is over-engineered:

4K HEVC at 15 Mbps: VMAF 96.5.

If VMAF 95 is sufficient for premium quality, the extra Mbps is waste. Reduce top-tier bitrate.

Calibration tooling

For pipelines, calibration is typically scripted:

def calibrate_ladder(content_corpus, encoder_config, rung_targets):
    """Run calibration sweep and produce calibrated bitrates."""
    results = {}
    
    for clip in content_corpus:
        for rung_name, (resolution, codec, vmaf_target) in rung_targets.items():
            bitrate_sweep = generate_bitrate_sweep(rung_name)
            
            for bitrate in bitrate_sweep:
                encoded = encode(clip, encoder_config, resolution, codec, bitrate)
                vmaf = compute_vmaf(clip, encoded)
                
                if vmaf >= vmaf_target:
                    # Found target bitrate for this clip + rung
                    results.setdefault(rung_name, []).append(bitrate)
                    break
    
    # Per rung: take max across clips (worst-case)
    calibrated = {rung: max(bitrates) for rung, bitrates in results.items()}
    return calibrated

Run on a content corpus; produces calibrated bitrate per rung.

Re-calibration triggers

Re-calibrate when:

Encoder version changes — new encoder version may have different efficiency.
Encoder preset changes — switching from medium to slow changes the curve.
Codec changes — adding AV1 to a previously HEVC-only ladder.
Quality target changes — raising or lowering target VMAF per rung.
Audience feedback — quality complaints suggest under-calibration.
Periodic refresh — every 6-12 months for hygiene.

Each calibration is significant compute; don't run continuously. Trigger explicitly when there's a reason.

Operational considerations

Things that matter for calibration in production:

Content corpus refresh — keep the calibration corpus current with production content. Old corpus may not represent new content patterns.
Statistical significance — small corpora may not span quality variation; aim for 10+ clips minimum.
Calibration documentation — record the calibration date, encoder version, content corpus, results. Future you will appreciate it.
Pipeline updates — when calibration completes, update production ladder configurations. Verify deployment works as intended.
A/B testing post-calibration — verify that calibrated ladder behaves as expected in production.

Multi-codec ladder calibration

When the ladder includes multiple codecs (AV1 top + HEVC mid + H.264 floor), calibration becomes more complex:

Per-codec calibration:

Cross-codec quality consistency:

Audience-weighted calibration:

Calibration cadence

How often to re-calibrate:

Per-encoder-update: when SVT-AV1 or x265 ships a new version, re-calibrate. The improvement may shift the curves.
Per-content-strategy-shift: when the production content type changes (new genres, different languages, different production styles), re-calibrate.
Annually: hygiene re-calibration. Even without specific changes, an annual sanity check catches drift.
Per-customer-feedback-cycle: if quality complaints come from a specific tier, re-calibrate that tier with the complaint pattern in mind.

Don't re-calibrate continuously — calibration is expensive (significant compute time on the bitrate sweep). Trigger explicitly.

ABR ladder VMAF calibration — finding the right bitrate per rung for your content

Why calibrate

The calibration procedure

Example calibration result

Per-content vs universal ladder

Identifying ladder gaps

Calibration tooling

Re-calibration triggers

Operational considerations

Multi-codec ladder calibration

Calibration cadence

What MpegFlow does with ABR ladder calibration

Related topics and reading

ABR ladder VMAF calibration — finding the right bitrate per rung for your content

Why calibrate

The calibration procedure

Example calibration result

Per-content vs universal ladder

Identifying ladder gaps

Calibration tooling

Re-calibration triggers

Operational considerations

Multi-codec ladder calibration

Calibration cadence

What MpegFlow does with ABR ladder calibration

Related topics and reading

ABR ladder VMAF calibration — finding the right bitrate per rung for your content

#Why calibrate

#The calibration procedure

#Example calibration result

#Per-content vs universal ladder

#Identifying ladder gaps

#Calibration tooling

#Re-calibration triggers

#Operational considerations

#Multi-codec ladder calibration

#Calibration cadence

#What MpegFlow does with ABR ladder calibration

Related topics and reading

ABR ladder VMAF calibration — finding the right bitrate per rung for your content

#Why calibrate

#The calibration procedure

#Example calibration result

#Per-content vs universal ladder

#Identifying ladder gaps

#Calibration tooling

#Re-calibration triggers

#Operational considerations

#Multi-codec ladder calibration

#Calibration cadence

#What MpegFlow does with ABR ladder calibration

Related topics and reading

Why calibrate

The calibration procedure

Example calibration result

Per-content vs universal ladder

Identifying ladder gaps

Calibration tooling

Re-calibration triggers

Operational considerations

Multi-codec ladder calibration

Calibration cadence

What MpegFlow does with ABR ladder calibration

Why calibrate

The calibration procedure

Example calibration result

Per-content vs universal ladder

Identifying ladder gaps

Calibration tooling

Re-calibration triggers

Operational considerations

Multi-codec ladder calibration

Calibration cadence

What MpegFlow does with ABR ladder calibration