BD-rate calculation — how to compare codecs and encoder configurations

MpegFlow

Practical guide to Bjontegaard delta-rate calculation — the math, multi-bitrate encoding procedure, BD-rate VMAF vs PSNR, libbjontegaard tooling, interpreting results.

BD-rate (Bjontegaard delta rate) is the standard way to express "codec A is X% more efficient than codec B at equivalent quality." Every codec comparison paper since the early 2000s reports BD-rate. Every encoder comparison in production uses it. The calculation is precise but non-obvious — you encode at multiple bitrates with each configuration, plot bitrate vs quality, then compute the area between the curves. This page is the engineering reference for the BD-rate calculation procedure and how to interpret results.

What BD-rate measures

BD-rate (Bjontegaard delta rate) compares two codec/encoder configurations:

Negative BD-rate = the second config achieves the same quality at lower bitrate (the second config is more efficient).
Positive BD-rate = the second config requires higher bitrate for the same quality (the second config is less efficient).
Zero BD-rate = configurations are equivalent.

A "BD-rate VMAF of -25%" means: at any given VMAF target, configuration B requires 25% lower bitrate than configuration A. Substantially more efficient.

The unit is percentage (proportional bitrate difference), not absolute. This makes BD-rate comparable across content types and quality ranges.

The math

The procedure:

Encode the same source at multiple bitrates with codec/encoder A. Get a set of (bitrate, quality) data points.
Encode the same source at the same multiple bitrates with codec/encoder B. Get another set of (bitrate, quality) data points.
Fit a curve to each set (typically a 4th-order polynomial in log-bitrate vs quality space).
Compute the area between the curves within the overlapping quality range.
Express the area as a percentage of the average bitrate.

The result is BD-rate.

The curve fit and integration are done in log-bitrate space because compression efficiency typically scales logarithmically with bitrate. Using log space makes the curves more linear and the integration more accurate.

The encoding procedure

To compute BD-rate, you need at least 4-5 (bitrate, quality) data points per configuration. The typical procedure:

Step 1: Pick representative content.

Use a corpus of test sequences. Common choices:

JVET test sequences — the standard for codec comparison papers. Industry-standard.
Netflix test set — used in VMAF training; good for streaming-aligned tests.
Custom corpus — your specific content (recommended for production decisions).

Step 2: Pick bitrates spanning the quality range.

For a 1080p video, typical bitrates: 1, 2, 4, 6, 10 Mbps. Adjust based on what you're testing.

The bitrates should produce VMAF values from ~75 (lower end) to ~95+ (high end). If your range doesn't cover this, BD-rate becomes less meaningful.

Step 3: Encode at each bitrate.

For each codec/encoder configuration, encode the source content at each chosen bitrate. Use VBR or capped CRF (not CRF, because CRF doesn't target a specific bitrate).

Step 4: Compute quality at each bitrate.

Run VMAF (or PSNR / SSIM) on each encoded version against the source. Record the results.

Step 5: Run BD-rate calculation.

Use a BD-rate calculator (libbjontegaard, custom Python script, etc.) on the (bitrate, quality) point sets.

Tooling for BD-rate

libbjontegaard — Python library:

from libbjontegaard import bd_rate

# Configuration A: codec A at 4 bitrates
bitrates_a = [1000, 2000, 4000, 8000]
quality_a = [78.5, 86.2, 91.8, 95.1]  # VMAF scores

# Configuration B: codec B at the same 4 bitrates
bitrates_b = [1000, 2000, 4000, 8000]
quality_b = [82.1, 88.9, 93.2, 96.0]  # VMAF scores

# Compute BD-rate
bd_rate_value = bd_rate(bitrates_a, quality_a, bitrates_b, quality_b)
print(f"BD-rate: {bd_rate_value:.2f}%")

The BD-rate is negative if B is more efficient than A; positive if A is more efficient.

Custom implementation:

import numpy as np

def bd_rate(bitrate_a, quality_a, bitrate_b, quality_b):
    """Compute Bjontegaard delta rate."""
    log_br_a = np.log10(bitrate_a)
    log_br_b = np.log10(bitrate_b)
    
    # Fit polynomials
    p_a = np.polyfit(quality_a, log_br_a, 4)
    p_b = np.polyfit(quality_b, log_br_b, 4)
    
    # Integration range: overlap of quality ranges
    qmin = max(min(quality_a), min(quality_b))
    qmax = min(max(quality_a), max(quality_b))
    
    # Integrate the polynomials
    int_a = np.polyint(p_a)
    int_b = np.polyint(p_b)
    
    integral_a = np.polyval(int_a, qmax) - np.polyval(int_a, qmin)
    integral_b = np.polyval(int_b, qmax) - np.polyval(int_b, qmin)
    
    avg = (integral_a + integral_b) / (2 * (qmax - qmin))
    bd_rate = (10**(integral_b / (qmax - qmin)) / 10**(integral_a / (qmax - qmin)) - 1) * 100
    
    return bd_rate

The math is well-defined; implementations vary in numerical stability. For production use, libbjontegaard is the standard.

BD-rate VMAF vs PSNR vs SSIM

Different quality metrics produce different BD-rate values for the same comparison:

BD-rate PSNR — historical standard. What codec papers report. Tends to favor codecs that mathematically optimize PSNR.
BD-rate SSIM — middle ground. Slightly better correlation with perception than PSNR.
BD-rate VMAF — modern standard for streaming. Best correlation with perception.

The differences can be substantial:

AV1 vs HEVC, BD-rate PSNR: ~30% improvement.
AV1 vs HEVC, BD-rate VMAF: ~25-35% improvement (similar but content-dependent).

For codec research, BD-rate PSNR is the historical baseline. For production decisions, BD-rate VMAF is the more meaningful number.

Most modern papers report all three (BD-rate PSNR, SSIM, VMAF) for completeness.

Interpreting BD-rate

A BD-rate value tells you the efficiency difference. Translating to practical implications:

-30% BD-rate (B 30% more efficient than A): For a 100 PB/year streaming service, switching from A to B saves ~30 PB/year of bandwidth. Compounds over years.
-15% BD-rate (B 15% more efficient): Modest improvement. May or may not justify migration costs depending on scale.
-5% BD-rate: Within noise margin for many comparisons. Probably not worth migrating for.
+5% BD-rate (B is less efficient): Configuration A is better. Migration would be wrong direction.
+15% BD-rate: A is meaningfully better. Worth understanding why before considering B.

The threshold for "worth migrating" depends on:

Your streaming volume (more volume = more value per percentage point).
Migration cost (compute, ops, testing).
Audience reach (B's lower bitrate only matters if your audience can decode B).

BD-rate caveats

Things BD-rate doesn't capture:

1. Encoding time.

Configuration B might be 2x more efficient (BD-rate VMAF -50%) but 10x slower. If wall-time matters (live streaming), BD-rate alone doesn't tell you which to choose.

2. Hardware compatibility.

Configuration B might be more efficient but require hardware some of your audience doesn't have. BD-rate VMAF is meaningless for unreachable audience.

3. Specific content sensitivity.

BD-rate measures average across the test corpus. Specific content types might have very different efficiency relationships. For a comparison that's content-specific, use a content-specific corpus.

4. Quality range applicability.

BD-rate is computed over the overlapping quality range. If your production quality target is at the edge of the range, BD-rate may not extrapolate.

5. Subjective vs objective quality.

VMAF correlates with MOS but isn't MOS. For premium content where subjective quality matters most, supplement BD-rate VMAF with subjective testing.

For pipeline decisions, treat BD-rate as one input among several. It's necessary but not sufficient.

BD-rate analysis workflow

A practical workflow for BD-rate analysis:

# pseudocode

def bd_rate_analysis(content_corpus, config_a, config_b, bitrates):
    results_a = []
    results_b = []
    
    for content in content_corpus:
        for bitrate in bitrates:
            # Encode with each config
            encoded_a = encode(content, config_a, bitrate)
            encoded_b = encode(content, config_b, bitrate)
            
            # Compute VMAF
            vmaf_a = compute_vmaf(content, encoded_a)
            vmaf_b = compute_vmaf(content, encoded_b)
            
            results_a.append((content, bitrate, vmaf_a))
            results_b.append((content, bitrate, vmaf_b))
    
    # Compute BD-rate per content, then average
    per_content_bd_rates = []
    for content in content_corpus:
        bitrates_a, qualities_a = get_results(results_a, content)
        bitrates_b, qualities_b = get_results(results_b, content)
        per_content_bd_rates.append(bd_rate(bitrates_a, qualities_a, bitrates_b, qualities_b))
    
    average_bd_rate = sum(per_content_bd_rates) / len(per_content_bd_rates)
    return average_bd_rate, per_content_bd_rates

For production, this is automated and run periodically (e.g., when new encoder versions ship).

Common BD-rate analysis mistakes

Mistake 1: Comparing different quality metrics.

Comparing BD-rate PSNR for one config with BD-rate VMAF for another. Not meaningful; results aren't comparable.

Mistake 2: Insufficient bitrate points.

Computing BD-rate from only 2-3 points. The polynomial fit is poorly constrained; results are unstable.

Mistake 3: Bitrate range not spanning quality range.

If both bitrates produce VMAF >= 95, the curves are almost flat; BD-rate is ill-defined.

Mistake 4: Using CRF instead of VBR.

CRF doesn't target specific bitrates; you'll get inconsistent bitrate measurements that don't fit into BD-rate analysis cleanly.

Mistake 5: Single-content analysis.

Different content responds differently to different codecs. A BD-rate computed from one piece of content doesn't generalize.

Mistake 6: Ignoring encoder version.

x265 v3.5 and v3.6 produce different BD-rates against the same comparison. Pin encoder versions; document them in results.

Production BD-rate use cases

Codec evaluation: comparing AV1 vs HEVC vs H.264 for production ladder decisions. BD-rate VMAF tells you bandwidth savings; combined with audience reach data, tells you whether AV1 is worth deploying.

Encoder version evaluation: comparing x265 v3.5 vs v3.6 to detect quality regressions. BD-rate quantifies improvement or regression magnitude.

Preset evaluation: comparing x265 medium vs slow. BD-rate at the slow preset is typically negative (slow is more efficient); the magnitude tells you whether the additional compute time is worth it.

Per-title vs universal ladder: BD-rate of per-title encoding vs universal ladder. Quantifies the savings per-title delivers.

Custom encoder configuration tuning: testing a new combination of x265 parameters against the baseline. BD-rate validates whether the new configuration helps.

Operational considerations

Things that matter for BD-rate analysis in production:

Reproducibility — pin encoder versions, content, bitrate selection, BD-rate tool version.
Statistical significance — small BD-rate values (~5%) may be within noise margin; verify with multiple runs.
Content corpus selection — match corpus to your production content type.
Documentation — record the configuration of every BD-rate analysis for future reference.
Continuous BD-rate monitoring — track BD-rate of your default encoder configuration vs reference over time. Detect regressions.

What MpegFlow does with BD-rate

BD-rate aggregation is not currently a pipeline-native operation in MpegFlow's DAG runtime. The pipeline runs the encoding portion that feeds a BD-rate analysis (parallel encode stages at the candidate bitrate sweep), and the VMAF / PSNR measurement portion (discrete measurement stages with the FFmpeg quality filters). The Bjontegaard delta-rate aggregation across the resulting (bitrate, quality) curves runs today in external scripts on the pipeline's structured output, not as a pipeline stage. Adding BD-rate as a native stage is on the backlog.

For internal engineering at MpegFlow, BD-rate is part of encoder regression testing — every encoder version update is BD-rate-tested against a representative corpus before deployment, with the encoding work happening on the pipeline and the aggregation happening in supporting tooling.

For customers evaluating encoder changes (e.g., x265 to SVT-HEVC, x264 medium to fast), the same shape applies: pipeline encodes the candidate set, pipeline measures quality, external tooling computes BD-rate from the structured output. We provide the tooling and recommend interpretation.

The strict-broker security model handles the encoding and measurement portions the same as any analysis — workers receive content via short-lived presigned URLs, encode, compute metrics, emit results.

The general guidance: BD-rate is the standard for codec comparison; understand the methodology; use it for decisions but combine with operational considerations (encoding time, hardware compatibility, etc.). Don't rely on BD-rate alone for production choices; do rely on it for codec/encoder efficiency comparisons.

What BD-rate measures

BD-rate (Bjontegaard delta rate) compares two codec/encoder configurations:

Negative BD-rate = the second config achieves the same quality at lower bitrate (the second config is more efficient).
Positive BD-rate = the second config requires higher bitrate for the same quality (the second config is less efficient).
Zero BD-rate = configurations are equivalent.

A "BD-rate VMAF of -25%" means: at any given VMAF target, configuration B requires 25% lower bitrate than configuration A. Substantially more efficient.

The unit is percentage (proportional bitrate difference), not absolute. This makes BD-rate comparable across content types and quality ranges.

The math

The procedure:

Encode the same source at multiple bitrates with codec/encoder A. Get a set of (bitrate, quality) data points.
Encode the same source at the same multiple bitrates with codec/encoder B. Get another set of (bitrate, quality) data points.
Fit a curve to each set (typically a 4th-order polynomial in log-bitrate vs quality space).
Compute the area between the curves within the overlapping quality range.
Express the area as a percentage of the average bitrate.

The result is BD-rate.

The encoding procedure

To compute BD-rate, you need at least 4-5 (bitrate, quality) data points per configuration. The typical procedure:

Step 1: Pick representative content.

Use a corpus of test sequences. Common choices:

JVET test sequences — the standard for codec comparison papers. Industry-standard.
Netflix test set — used in VMAF training; good for streaming-aligned tests.
Custom corpus — your specific content (recommended for production decisions).

Step 2: Pick bitrates spanning the quality range.

For a 1080p video, typical bitrates: 1, 2, 4, 6, 10 Mbps. Adjust based on what you're testing.

The bitrates should produce VMAF values from ~75 (lower end) to ~95+ (high end). If your range doesn't cover this, BD-rate becomes less meaningful.

Step 3: Encode at each bitrate.

For each codec/encoder configuration, encode the source content at each chosen bitrate. Use VBR or capped CRF (not CRF, because CRF doesn't target a specific bitrate).

Step 4: Compute quality at each bitrate.

Run VMAF (or PSNR / SSIM) on each encoded version against the source. Record the results.

Step 5: Run BD-rate calculation.

Use a BD-rate calculator (libbjontegaard, custom Python script, etc.) on the (bitrate, quality) point sets.

Tooling for BD-rate

libbjontegaard — Python library:

from libbjontegaard import bd_rate

# Configuration A: codec A at 4 bitrates
bitrates_a = [1000, 2000, 4000, 8000]
quality_a = [78.5, 86.2, 91.8, 95.1]  # VMAF scores

# Configuration B: codec B at the same 4 bitrates
bitrates_b = [1000, 2000, 4000, 8000]
quality_b = [82.1, 88.9, 93.2, 96.0]  # VMAF scores

# Compute BD-rate
bd_rate_value = bd_rate(bitrates_a, quality_a, bitrates_b, quality_b)
print(f"BD-rate: {bd_rate_value:.2f}%")

The BD-rate is negative if B is more efficient than A; positive if A is more efficient.

Custom implementation:

import numpy as np

def bd_rate(bitrate_a, quality_a, bitrate_b, quality_b):
    """Compute Bjontegaard delta rate."""
    log_br_a = np.log10(bitrate_a)
    log_br_b = np.log10(bitrate_b)
    
    # Fit polynomials
    p_a = np.polyfit(quality_a, log_br_a, 4)
    p_b = np.polyfit(quality_b, log_br_b, 4)
    
    # Integration range: overlap of quality ranges
    qmin = max(min(quality_a), min(quality_b))
    qmax = min(max(quality_a), max(quality_b))
    
    # Integrate the polynomials
    int_a = np.polyint(p_a)
    int_b = np.polyint(p_b)
    
    integral_a = np.polyval(int_a, qmax) - np.polyval(int_a, qmin)
    integral_b = np.polyval(int_b, qmax) - np.polyval(int_b, qmin)
    
    avg = (integral_a + integral_b) / (2 * (qmax - qmin))
    bd_rate = (10**(integral_b / (qmax - qmin)) / 10**(integral_a / (qmax - qmin)) - 1) * 100
    
    return bd_rate

The math is well-defined; implementations vary in numerical stability. For production use, libbjontegaard is the standard.

BD-rate VMAF vs PSNR vs SSIM

Different quality metrics produce different BD-rate values for the same comparison:

BD-rate PSNR — historical standard. What codec papers report. Tends to favor codecs that mathematically optimize PSNR.
BD-rate SSIM — middle ground. Slightly better correlation with perception than PSNR.
BD-rate VMAF — modern standard for streaming. Best correlation with perception.

The differences can be substantial:

AV1 vs HEVC, BD-rate PSNR: ~30% improvement.
AV1 vs HEVC, BD-rate VMAF: ~25-35% improvement (similar but content-dependent).

For codec research, BD-rate PSNR is the historical baseline. For production decisions, BD-rate VMAF is the more meaningful number.

Most modern papers report all three (BD-rate PSNR, SSIM, VMAF) for completeness.

Interpreting BD-rate

A BD-rate value tells you the efficiency difference. Translating to practical implications:

-30% BD-rate (B 30% more efficient than A): For a 100 PB/year streaming service, switching from A to B saves ~30 PB/year of bandwidth. Compounds over years.
-15% BD-rate (B 15% more efficient): Modest improvement. May or may not justify migration costs depending on scale.
-5% BD-rate: Within noise margin for many comparisons. Probably not worth migrating for.
+5% BD-rate (B is less efficient): Configuration A is better. Migration would be wrong direction.
+15% BD-rate: A is meaningfully better. Worth understanding why before considering B.

The threshold for "worth migrating" depends on:

Your streaming volume (more volume = more value per percentage point).
Migration cost (compute, ops, testing).
Audience reach (B's lower bitrate only matters if your audience can decode B).

BD-rate caveats

Things BD-rate doesn't capture:

1. Encoding time.

Configuration B might be 2x more efficient (BD-rate VMAF -50%) but 10x slower. If wall-time matters (live streaming), BD-rate alone doesn't tell you which to choose.

2. Hardware compatibility.

Configuration B might be more efficient but require hardware some of your audience doesn't have. BD-rate VMAF is meaningless for unreachable audience.

3. Specific content sensitivity.

BD-rate measures average across the test corpus. Specific content types might have very different efficiency relationships. For a comparison that's content-specific, use a content-specific corpus.

4. Quality range applicability.

BD-rate is computed over the overlapping quality range. If your production quality target is at the edge of the range, BD-rate may not extrapolate.

5. Subjective vs objective quality.

VMAF correlates with MOS but isn't MOS. For premium content where subjective quality matters most, supplement BD-rate VMAF with subjective testing.

For pipeline decisions, treat BD-rate as one input among several. It's necessary but not sufficient.

BD-rate analysis workflow

A practical workflow for BD-rate analysis:

# pseudocode

def bd_rate_analysis(content_corpus, config_a, config_b, bitrates):
    results_a = []
    results_b = []
    
    for content in content_corpus:
        for bitrate in bitrates:
            # Encode with each config
            encoded_a = encode(content, config_a, bitrate)
            encoded_b = encode(content, config_b, bitrate)
            
            # Compute VMAF
            vmaf_a = compute_vmaf(content, encoded_a)
            vmaf_b = compute_vmaf(content, encoded_b)
            
            results_a.append((content, bitrate, vmaf_a))
            results_b.append((content, bitrate, vmaf_b))
    
    # Compute BD-rate per content, then average
    per_content_bd_rates = []
    for content in content_corpus:
        bitrates_a, qualities_a = get_results(results_a, content)
        bitrates_b, qualities_b = get_results(results_b, content)
        per_content_bd_rates.append(bd_rate(bitrates_a, qualities_a, bitrates_b, qualities_b))
    
    average_bd_rate = sum(per_content_bd_rates) / len(per_content_bd_rates)
    return average_bd_rate, per_content_bd_rates

For production, this is automated and run periodically (e.g., when new encoder versions ship).

Common BD-rate analysis mistakes

Mistake 1: Comparing different quality metrics.

Comparing BD-rate PSNR for one config with BD-rate VMAF for another. Not meaningful; results aren't comparable.

Mistake 2: Insufficient bitrate points.

Computing BD-rate from only 2-3 points. The polynomial fit is poorly constrained; results are unstable.

Mistake 3: Bitrate range not spanning quality range.

If both bitrates produce VMAF >= 95, the curves are almost flat; BD-rate is ill-defined.

Mistake 4: Using CRF instead of VBR.

CRF doesn't target specific bitrates; you'll get inconsistent bitrate measurements that don't fit into BD-rate analysis cleanly.

Mistake 5: Single-content analysis.

Different content responds differently to different codecs. A BD-rate computed from one piece of content doesn't generalize.

Mistake 6: Ignoring encoder version.

x265 v3.5 and v3.6 produce different BD-rates against the same comparison. Pin encoder versions; document them in results.

Production BD-rate use cases

Encoder version evaluation: comparing x265 v3.5 vs v3.6 to detect quality regressions. BD-rate quantifies improvement or regression magnitude.

Per-title vs universal ladder: BD-rate of per-title encoding vs universal ladder. Quantifies the savings per-title delivers.

Custom encoder configuration tuning: testing a new combination of x265 parameters against the baseline. BD-rate validates whether the new configuration helps.

Operational considerations

Things that matter for BD-rate analysis in production:

Reproducibility — pin encoder versions, content, bitrate selection, BD-rate tool version.
Statistical significance — small BD-rate values (~5%) may be within noise margin; verify with multiple runs.
Content corpus selection — match corpus to your production content type.
Documentation — record the configuration of every BD-rate analysis for future reference.
Continuous BD-rate monitoring — track BD-rate of your default encoder configuration vs reference over time. Detect regressions.

BD-rate calculation — how to compare codecs and encoder configurations

What BD-rate measures

The math

The encoding procedure

Tooling for BD-rate

BD-rate VMAF vs PSNR vs SSIM

Interpreting BD-rate

BD-rate caveats

BD-rate analysis workflow

Common BD-rate analysis mistakes

Production BD-rate use cases

Operational considerations

What MpegFlow does with BD-rate

Related topics and reading

BD-rate calculation — how to compare codecs and encoder configurations

What BD-rate measures

The math

The encoding procedure

Tooling for BD-rate

BD-rate VMAF vs PSNR vs SSIM

Interpreting BD-rate

BD-rate caveats

BD-rate analysis workflow

Common BD-rate analysis mistakes

Production BD-rate use cases

Operational considerations

What MpegFlow does with BD-rate

Related topics and reading

BD-rate calculation — how to compare codecs and encoder configurations

#What BD-rate measures

#The math

#The encoding procedure

#Tooling for BD-rate

#BD-rate VMAF vs PSNR vs SSIM

#Interpreting BD-rate

#BD-rate caveats

#BD-rate analysis workflow

#Common BD-rate analysis mistakes

#Production BD-rate use cases

#Operational considerations

#What MpegFlow does with BD-rate

Related topics and reading

BD-rate calculation — how to compare codecs and encoder configurations

#What BD-rate measures

#The math

#The encoding procedure

#Tooling for BD-rate

#BD-rate VMAF vs PSNR vs SSIM

#Interpreting BD-rate

#BD-rate caveats

#BD-rate analysis workflow

#Common BD-rate analysis mistakes

#Production BD-rate use cases

#Operational considerations

#What MpegFlow does with BD-rate

Related topics and reading

What BD-rate measures

The math

The encoding procedure

Tooling for BD-rate

BD-rate VMAF vs PSNR vs SSIM

Interpreting BD-rate

BD-rate caveats

BD-rate analysis workflow

Common BD-rate analysis mistakes

Production BD-rate use cases

Operational considerations

What MpegFlow does with BD-rate

What BD-rate measures

The math

The encoding procedure

Tooling for BD-rate

BD-rate VMAF vs PSNR vs SSIM

Interpreting BD-rate

BD-rate caveats

BD-rate analysis workflow

Common BD-rate analysis mistakes

Production BD-rate use cases

Operational considerations

What MpegFlow does with BD-rate