MpegFlowBlogBack to home
← Topics·Quality

BD-rate calculation — how to compare codecs and encoder configurations

Practical guide to Bjontegaard delta-rate calculation — the math, multi-bitrate encoding procedure, BD-rate VMAF vs PSNR, libbjontegaard tooling, interpreting results.

ByMpegFlow Engineering Team·Quality
·May 8, 2026·9 min read·1,821 words
In this topic
  1. What BD-rate measures
  2. The math
  3. The encoding procedure
  4. Tooling for BD-rate
  5. BD-rate VMAF vs PSNR vs SSIM
  6. Interpreting BD-rate
  7. BD-rate caveats
  8. BD-rate analysis workflow
  9. Common BD-rate analysis mistakes
  10. Production BD-rate use cases
  11. Operational considerations
  12. What MpegFlow does with BD-rate

BD-rate (Bjontegaard delta rate) is the standard way to express "codec A is X% more efficient than codec B at equivalent quality." Every codec comparison paper since the early 2000s reports BD-rate. Every encoder comparison in production uses it. The calculation is precise but non-obvious — you encode at multiple bitrates with each configuration, plot bitrate vs quality, then compute the area between the curves. This page is the engineering reference for the BD-rate calculation procedure and how to interpret results.

#What BD-rate measures

BD-rate (Bjontegaard delta rate) compares two codec/encoder configurations:

  • Negative BD-rate = the second config achieves the same quality at lower bitrate (the second config is more efficient).
  • Positive BD-rate = the second config requires higher bitrate for the same quality (the second config is less efficient).
  • Zero BD-rate = configurations are equivalent.

A "BD-rate VMAF of -25%" means: at any given VMAF target, configuration B requires 25% lower bitrate than configuration A. Substantially more efficient.

The unit is percentage (proportional bitrate difference), not absolute. This makes BD-rate comparable across content types and quality ranges.

#The math

The procedure:

  1. Encode the same source at multiple bitrates with codec/encoder A. Get a set of (bitrate, quality) data points.
  2. Encode the same source at the same multiple bitrates with codec/encoder B. Get another set of (bitrate, quality) data points.
  3. Fit a curve to each set (typically a 4th-order polynomial in log-bitrate vs quality space).
  4. Compute the area between the curves within the overlapping quality range.
  5. Express the area as a percentage of the average bitrate.

The result is BD-rate.

The curve fit and integration are done in log-bitrate space because compression efficiency typically scales logarithmically with bitrate. Using log space makes the curves more linear and the integration more accurate.

#The encoding procedure

To compute BD-rate, you need at least 4-5 (bitrate, quality) data points per configuration. The typical procedure:

Step 1: Pick representative content.

Use a corpus of test sequences. Common choices:

  • JVET test sequences — the standard for codec comparison papers. Industry-standard.
  • Netflix test set — used in VMAF training; good for streaming-aligned tests.
  • Custom corpus — your specific content (recommended for production decisions).

Step 2: Pick bitrates spanning the quality range.

For a 1080p video, typical bitrates: 1, 2, 4, 6, 10 Mbps. Adjust based on what you're testing.

The bitrates should produce VMAF values from ~75 (lower end) to ~95+ (high end). If your range doesn't cover this, BD-rate becomes less meaningful.

Step 3: Encode at each bitrate.

For each codec/encoder configuration, encode the source content at each chosen bitrate. Use VBR or capped CRF (not CRF, because CRF doesn't target a specific bitrate).

Step 4: Compute quality at each bitrate.

Run VMAF (or PSNR / SSIM) on each encoded version against the source. Record the results.

Step 5: Run BD-rate calculation.

Use a BD-rate calculator (libbjontegaard, custom Python script, etc.) on the (bitrate, quality) point sets.

#Tooling for BD-rate

libbjontegaard — Python library:

from libbjontegaard import bd_rate

# Configuration A: codec A at 4 bitrates
bitrates_a = [1000, 2000, 4000, 8000]
quality_a = [78.5, 86.2, 91.8, 95.1]  # VMAF scores

# Configuration B: codec B at the same 4 bitrates
bitrates_b = [1000, 2000, 4000, 8000]
quality_b = [82.1, 88.9, 93.2, 96.0]  # VMAF scores

# Compute BD-rate
bd_rate_value = bd_rate(bitrates_a, quality_a, bitrates_b, quality_b)
print(f"BD-rate: {bd_rate_value:.2f}%")

The BD-rate is negative if B is more efficient than A; positive if A is more efficient.

Custom implementation:

import numpy as np

def bd_rate(bitrate_a, quality_a, bitrate_b, quality_b):
    """Compute Bjontegaard delta rate."""
    log_br_a = np.log10(bitrate_a)
    log_br_b = np.log10(bitrate_b)
    
    # Fit polynomials
    p_a = np.polyfit(quality_a, log_br_a, 4)
    p_b = np.polyfit(quality_b, log_br_b, 4)
    
    # Integration range: overlap of quality ranges
    qmin = max(min(quality_a), min(quality_b))
    qmax = min(max(quality_a), max(quality_b))
    
    # Integrate the polynomials
    int_a = np.polyint(p_a)
    int_b = np.polyint(p_b)
    
    integral_a = np.polyval(int_a, qmax) - np.polyval(int_a, qmin)
    integral_b = np.polyval(int_b, qmax) - np.polyval(int_b, qmin)
    
    avg = (integral_a + integral_b) / (2 * (qmax - qmin))
    bd_rate = (10**(integral_b / (qmax - qmin)) / 10**(integral_a / (qmax - qmin)) - 1) * 100
    
    return bd_rate

The math is well-defined; implementations vary in numerical stability. For production use, libbjontegaard is the standard.

#BD-rate VMAF vs PSNR vs SSIM

Different quality metrics produce different BD-rate values for the same comparison:

  • BD-rate PSNR — historical standard. What codec papers report. Tends to favor codecs that mathematically optimize PSNR.
  • BD-rate SSIM — middle ground. Slightly better correlation with perception than PSNR.
  • BD-rate VMAF — modern standard for streaming. Best correlation with perception.

The differences can be substantial:

  • AV1 vs HEVC, BD-rate PSNR: ~30% improvement.
  • AV1 vs HEVC, BD-rate VMAF: ~25-35% improvement (similar but content-dependent).

For codec research, BD-rate PSNR is the historical baseline. For production decisions, BD-rate VMAF is the more meaningful number.

Most modern papers report all three (BD-rate PSNR, SSIM, VMAF) for completeness.

#Interpreting BD-rate

A BD-rate value tells you the efficiency difference. Translating to practical implications:

  • -30% BD-rate (B 30% more efficient than A): For a 100 PB/year streaming service, switching from A to B saves ~30 PB/year of bandwidth. Compounds over years.

  • -15% BD-rate (B 15% more efficient): Modest improvement. May or may not justify migration costs depending on scale.

  • -5% BD-rate: Within noise margin for many comparisons. Probably not worth migrating for.

  • +5% BD-rate (B is less efficient): Configuration A is better. Migration would be wrong direction.

  • +15% BD-rate: A is meaningfully better. Worth understanding why before considering B.

The threshold for "worth migrating" depends on:

  • Your streaming volume (more volume = more value per percentage point).
  • Migration cost (compute, ops, testing).
  • Audience reach (B's lower bitrate only matters if your audience can decode B).

#BD-rate caveats

Things BD-rate doesn't capture:

1. Encoding time.

Configuration B might be 2x more efficient (BD-rate VMAF -50%) but 10x slower. If wall-time matters (live streaming), BD-rate alone doesn't tell you which to choose.

2. Hardware compatibility.

Configuration B might be more efficient but require hardware some of your audience doesn't have. BD-rate VMAF is meaningless for unreachable audience.

3. Specific content sensitivity.

BD-rate measures average across the test corpus. Specific content types might have very different efficiency relationships. For a comparison that's content-specific, use a content-specific corpus.

4. Quality range applicability.

BD-rate is computed over the overlapping quality range. If your production quality target is at the edge of the range, BD-rate may not extrapolate.

5. Subjective vs objective quality.

VMAF correlates with MOS but isn't MOS. For premium content where subjective quality matters most, supplement BD-rate VMAF with subjective testing.

For pipeline decisions, treat BD-rate as one input among several. It's necessary but not sufficient.

#BD-rate analysis workflow

A practical workflow for BD-rate analysis:

# pseudocode

def bd_rate_analysis(content_corpus, config_a, config_b, bitrates):
    results_a = []
    results_b = []
    
    for content in content_corpus:
        for bitrate in bitrates:
            # Encode with each config
            encoded_a = encode(content, config_a, bitrate)
            encoded_b = encode(content, config_b, bitrate)
            
            # Compute VMAF
            vmaf_a = compute_vmaf(content, encoded_a)
            vmaf_b = compute_vmaf(content, encoded_b)
            
            results_a.append((content, bitrate, vmaf_a))
            results_b.append((content, bitrate, vmaf_b))
    
    # Compute BD-rate per content, then average
    per_content_bd_rates = []
    for content in content_corpus:
        bitrates_a, qualities_a = get_results(results_a, content)
        bitrates_b, qualities_b = get_results(results_b, content)
        per_content_bd_rates.append(bd_rate(bitrates_a, qualities_a, bitrates_b, qualities_b))
    
    average_bd_rate = sum(per_content_bd_rates) / len(per_content_bd_rates)
    return average_bd_rate, per_content_bd_rates

For production, this is automated and run periodically (e.g., when new encoder versions ship).

#Common BD-rate analysis mistakes

Mistake 1: Comparing different quality metrics.

Comparing BD-rate PSNR for one config with BD-rate VMAF for another. Not meaningful; results aren't comparable.

Mistake 2: Insufficient bitrate points.

Computing BD-rate from only 2-3 points. The polynomial fit is poorly constrained; results are unstable.

Mistake 3: Bitrate range not spanning quality range.

If both bitrates produce VMAF >= 95, the curves are almost flat; BD-rate is ill-defined.

Mistake 4: Using CRF instead of VBR.

CRF doesn't target specific bitrates; you'll get inconsistent bitrate measurements that don't fit into BD-rate analysis cleanly.

Mistake 5: Single-content analysis.

Different content responds differently to different codecs. A BD-rate computed from one piece of content doesn't generalize.

Mistake 6: Ignoring encoder version.

x265 v3.5 and v3.6 produce different BD-rates against the same comparison. Pin encoder versions; document them in results.

#Production BD-rate use cases

Codec evaluation: comparing AV1 vs HEVC vs H.264 for production ladder decisions. BD-rate VMAF tells you bandwidth savings; combined with audience reach data, tells you whether AV1 is worth deploying.

Encoder version evaluation: comparing x265 v3.5 vs v3.6 to detect quality regressions. BD-rate quantifies improvement or regression magnitude.

Preset evaluation: comparing x265 medium vs slow. BD-rate at the slow preset is typically negative (slow is more efficient); the magnitude tells you whether the additional compute time is worth it.

Per-title vs universal ladder: BD-rate of per-title encoding vs universal ladder. Quantifies the savings per-title delivers.

Custom encoder configuration tuning: testing a new combination of x265 parameters against the baseline. BD-rate validates whether the new configuration helps.

#Operational considerations

Things that matter for BD-rate analysis in production:

  • Reproducibility — pin encoder versions, content, bitrate selection, BD-rate tool version.
  • Statistical significance — small BD-rate values (~5%) may be within noise margin; verify with multiple runs.
  • Content corpus selection — match corpus to your production content type.
  • Documentation — record the configuration of every BD-rate analysis for future reference.
  • Continuous BD-rate monitoring — track BD-rate of your default encoder configuration vs reference over time. Detect regressions.

#What MpegFlow does with BD-rate

BD-rate aggregation is not currently a pipeline-native operation in MpegFlow's DAG runtime. The pipeline runs the encoding portion that feeds a BD-rate analysis (parallel encode stages at the candidate bitrate sweep), and the VMAF / PSNR measurement portion (discrete measurement stages with the FFmpeg quality filters). The Bjontegaard delta-rate aggregation across the resulting (bitrate, quality) curves runs today in external scripts on the pipeline's structured output, not as a pipeline stage. Adding BD-rate as a native stage is on the backlog.

For internal engineering at MpegFlow, BD-rate is part of encoder regression testing — every encoder version update is BD-rate-tested against a representative corpus before deployment, with the encoding work happening on the pipeline and the aggregation happening in supporting tooling.

For customers evaluating encoder changes (e.g., x265 to SVT-HEVC, x264 medium to fast), the same shape applies: pipeline encodes the candidate set, pipeline measures quality, external tooling computes BD-rate from the structured output. We provide the tooling and recommend interpretation.

The strict-broker security model handles the encoding and measurement portions the same as any analysis — workers receive content via short-lived presigned URLs, encode, compute metrics, emit results.

The general guidance: BD-rate is the standard for codec comparison; understand the methodology; use it for decisions but combine with operational considerations (encoding time, hardware compatibility, etc.). Don't rely on BD-rate alone for production choices; do rely on it for codec/encoder efficiency comparisons.

Tags
  • bd-rate
  • quality
  • codec-evaluation
  • vmaf
  • psnr
  • bjontegaard
See also

Related topics and reading

  • PSNR — the classic quality metric, why it persists, and where it fails
  • SSIM — the structural similarity metric and its multi-scale variants
  • VMAF — Netflix's quality metric and the modern reference for video quality measurement
Building on this?

Join the MpegFlow beta.

We're shipping the encoder MVP this quarter. If you're wrangling quality in production, the beta is built for you — no card, no console waiting.

Join the beta More quality
© 2026 MpegFlow, Inc. · Trust & complianceAll systems nominal·StatusPrivacy