BD-rate (Bjontegaard delta rate) is the standard way to express "codec A is X% more efficient than codec B at equivalent quality." Every codec comparison paper since the early 2000s reports BD-rate. Every encoder comparison in production uses it. The calculation is precise but non-obvious — you encode at multiple bitrates with each configuration, plot bitrate vs quality, then compute the area between the curves. This page is the engineering reference for the BD-rate calculation procedure and how to interpret results.
What BD-rate measures
BD-rate (Bjontegaard delta rate) compares two codec/encoder configurations:
- Negative BD-rate = the second config achieves the same quality at lower bitrate (the second config is more efficient).
- Positive BD-rate = the second config requires higher bitrate for the same quality (the second config is less efficient).
- Zero BD-rate = configurations are equivalent.
A "BD-rate VMAF of -25%" means: at any given VMAF target, configuration B requires 25% lower bitrate than configuration A. Substantially more efficient.
The unit is percentage (proportional bitrate difference), not absolute. This makes BD-rate comparable across content types and quality ranges.
The math
The procedure:
- Encode the same source at multiple bitrates with codec/encoder A. Get a set of (bitrate, quality) data points.
- Encode the same source at the same multiple bitrates with codec/encoder B. Get another set of (bitrate, quality) data points.
- Fit a curve to each set (typically a 4th-order polynomial in log-bitrate vs quality space).
- Compute the area between the curves within the overlapping quality range.
- Express the area as a percentage of the average bitrate.
The result is BD-rate.
The curve fit and integration are done in log-bitrate space because compression efficiency typically scales logarithmically with bitrate. Using log space makes the curves more linear and the integration more accurate.
The encoding procedure
To compute BD-rate, you need at least 4-5 (bitrate, quality) data points per configuration. The typical procedure:
Step 1: Pick representative content.
Use a corpus of test sequences. Common choices:
- JVET test sequences — the standard for codec comparison papers. Industry-standard.
- Netflix test set — used in VMAF training; good for streaming-aligned tests.
- Custom corpus — your specific content (recommended for production decisions).
Step 2: Pick bitrates spanning the quality range.
For a 1080p video, typical bitrates: 1, 2, 4, 6, 10 Mbps. Adjust based on what you're testing.
The bitrates should produce VMAF values from ~75 (lower end) to ~95+ (high end). If your range doesn't cover this, BD-rate becomes less meaningful.
Step 3: Encode at each bitrate.
For each codec/encoder configuration, encode the source content at each chosen bitrate. Use VBR or capped CRF (not CRF, because CRF doesn't target a specific bitrate).
Step 4: Compute quality at each bitrate.
Run VMAF (or PSNR / SSIM) on each encoded version against the source. Record the results.
Step 5: Run BD-rate calculation.
Use a BD-rate calculator (libbjontegaard, custom Python script, etc.) on the (bitrate, quality) point sets.
Tooling for BD-rate
libbjontegaard — Python library:
from libbjontegaard import bd_rate
# Configuration A: codec A at 4 bitrates
bitrates_a = [1000, 2000, 4000, 8000]
quality_a = [78.5, 86.2, 91.8, 95.1] # VMAF scores
# Configuration B: codec B at the same 4 bitrates
bitrates_b = [1000, 2000, 4000, 8000]
quality_b = [82.1, 88.9, 93.2, 96.0] # VMAF scores
# Compute BD-rate
bd_rate_value = bd_rate(bitrates_a, quality_a, bitrates_b, quality_b)
print(f"BD-rate: {bd_rate_value:.2f}%")
The BD-rate is negative if B is more efficient than A; positive if A is more efficient.
Custom implementation:
import numpy as np
def bd_rate(bitrate_a, quality_a, bitrate_b, quality_b):
"""Compute Bjontegaard delta rate."""
log_br_a = np.log10(bitrate_a)
log_br_b = np.log10(bitrate_b)
# Fit polynomials
p_a = np.polyfit(quality_a, log_br_a, 4)
p_b = np.polyfit(quality_b, log_br_b, 4)
# Integration range: overlap of quality ranges
qmin = max(min(quality_a), min(quality_b))
qmax = min(max(quality_a), max(quality_b))
# Integrate the polynomials
int_a = np.polyint(p_a)
int_b = np.polyint(p_b)
integral_a = np.polyval(int_a, qmax) - np.polyval(int_a, qmin)
integral_b = np.polyval(int_b, qmax) - np.polyval(int_b, qmin)
avg = (integral_a + integral_b) / (2 * (qmax - qmin))
bd_rate = (10**(integral_b / (qmax - qmin)) / 10**(integral_a / (qmax - qmin)) - 1) * 100
return bd_rate
The math is well-defined; implementations vary in numerical stability. For production use, libbjontegaard is the standard.
BD-rate VMAF vs PSNR vs SSIM
Different quality metrics produce different BD-rate values for the same comparison:
- BD-rate PSNR — historical standard. What codec papers report. Tends to favor codecs that mathematically optimize PSNR.
- BD-rate SSIM — middle ground. Slightly better correlation with perception than PSNR.
- BD-rate VMAF — modern standard for streaming. Best correlation with perception.
The differences can be substantial:
- AV1 vs HEVC, BD-rate PSNR: ~30% improvement.
- AV1 vs HEVC, BD-rate VMAF: ~25-35% improvement (similar but content-dependent).
For codec research, BD-rate PSNR is the historical baseline. For production decisions, BD-rate VMAF is the more meaningful number.
Most modern papers report all three (BD-rate PSNR, SSIM, VMAF) for completeness.
Interpreting BD-rate
A BD-rate value tells you the efficiency difference. Translating to practical implications:
-30% BD-rate (B 30% more efficient than A): For a 100 PB/year streaming service, switching from A to B saves ~30 PB/year of bandwidth. Compounds over years.
-15% BD-rate (B 15% more efficient): Modest improvement. May or may not justify migration costs depending on scale.
-5% BD-rate: Within noise margin for many comparisons. Probably not worth migrating for.
+5% BD-rate (B is less efficient): Configuration A is better. Migration would be wrong direction.
+15% BD-rate: A is meaningfully better. Worth understanding why before considering B.
The threshold for "worth migrating" depends on:
- Your streaming volume (more volume = more value per percentage point).
- Migration cost (compute, ops, testing).
- Audience reach (B's lower bitrate only matters if your audience can decode B).
BD-rate caveats
Things BD-rate doesn't capture:
1. Encoding time.
Configuration B might be 2x more efficient (BD-rate VMAF -50%) but 10x slower. If wall-time matters (live streaming), BD-rate alone doesn't tell you which to choose.
2. Hardware compatibility.
Configuration B might be more efficient but require hardware some of your audience doesn't have. BD-rate VMAF is meaningless for unreachable audience.
3. Specific content sensitivity.
BD-rate measures average across the test corpus. Specific content types might have very different efficiency relationships. For a comparison that's content-specific, use a content-specific corpus.
4. Quality range applicability.
BD-rate is computed over the overlapping quality range. If your production quality target is at the edge of the range, BD-rate may not extrapolate.
5. Subjective vs objective quality.
VMAF correlates with MOS but isn't MOS. For premium content where subjective quality matters most, supplement BD-rate VMAF with subjective testing.
For pipeline decisions, treat BD-rate as one input among several. It's necessary but not sufficient.
BD-rate analysis workflow
A practical workflow for BD-rate analysis:
# pseudocode
def bd_rate_analysis(content_corpus, config_a, config_b, bitrates):
results_a = []
results_b = []
for content in content_corpus:
for bitrate in bitrates:
# Encode with each config
encoded_a = encode(content, config_a, bitrate)
encoded_b = encode(content, config_b, bitrate)
# Compute VMAF
vmaf_a = compute_vmaf(content, encoded_a)
vmaf_b = compute_vmaf(content, encoded_b)
results_a.append((content, bitrate, vmaf_a))
results_b.append((content, bitrate, vmaf_b))
# Compute BD-rate per content, then average
per_content_bd_rates = []
for content in content_corpus:
bitrates_a, qualities_a = get_results(results_a, content)
bitrates_b, qualities_b = get_results(results_b, content)
per_content_bd_rates.append(bd_rate(bitrates_a, qualities_a, bitrates_b, qualities_b))
average_bd_rate = sum(per_content_bd_rates) / len(per_content_bd_rates)
return average_bd_rate, per_content_bd_rates
For production, this is automated and run periodically (e.g., when new encoder versions ship).
Common BD-rate analysis mistakes
Mistake 1: Comparing different quality metrics.
Comparing BD-rate PSNR for one config with BD-rate VMAF for another. Not meaningful; results aren't comparable.
Mistake 2: Insufficient bitrate points.
Computing BD-rate from only 2-3 points. The polynomial fit is poorly constrained; results are unstable.
Mistake 3: Bitrate range not spanning quality range.
If both bitrates produce VMAF >= 95, the curves are almost flat; BD-rate is ill-defined.
Mistake 4: Using CRF instead of VBR.
CRF doesn't target specific bitrates; you'll get inconsistent bitrate measurements that don't fit into BD-rate analysis cleanly.
Mistake 5: Single-content analysis.
Different content responds differently to different codecs. A BD-rate computed from one piece of content doesn't generalize.
Mistake 6: Ignoring encoder version.
x265 v3.5 and v3.6 produce different BD-rates against the same comparison. Pin encoder versions; document them in results.
Production BD-rate use cases
Codec evaluation: comparing AV1 vs HEVC vs H.264 for production ladder decisions. BD-rate VMAF tells you bandwidth savings; combined with audience reach data, tells you whether AV1 is worth deploying.
Encoder version evaluation: comparing x265 v3.5 vs v3.6 to detect quality regressions. BD-rate quantifies improvement or regression magnitude.
Preset evaluation: comparing x265 medium vs slow. BD-rate at the slow preset is typically negative (slow is more efficient); the magnitude tells you whether the additional compute time is worth it.
Per-title vs universal ladder: BD-rate of per-title encoding vs universal ladder. Quantifies the savings per-title delivers.
Custom encoder configuration tuning: testing a new combination of x265 parameters against the baseline. BD-rate validates whether the new configuration helps.
Operational considerations
Things that matter for BD-rate analysis in production:
- Reproducibility — pin encoder versions, content, bitrate selection, BD-rate tool version.
- Statistical significance — small BD-rate values (~5%) may be within noise margin; verify with multiple runs.
- Content corpus selection — match corpus to your production content type.
- Documentation — record the configuration of every BD-rate analysis for future reference.
- Continuous BD-rate monitoring — track BD-rate of your default encoder configuration vs reference over time. Detect regressions.
What MpegFlow does with BD-rate
BD-rate aggregation is not currently a pipeline-native operation in MpegFlow's DAG runtime. The pipeline runs the encoding portion that feeds a BD-rate analysis (parallel encode stages at the candidate bitrate sweep), and the VMAF / PSNR measurement portion (discrete measurement stages with the FFmpeg quality filters). The Bjontegaard delta-rate aggregation across the resulting (bitrate, quality) curves runs today in external scripts on the pipeline's structured output, not as a pipeline stage. Adding BD-rate as a native stage is on the backlog.
For internal engineering at MpegFlow, BD-rate is part of encoder regression testing — every encoder version update is BD-rate-tested against a representative corpus before deployment, with the encoding work happening on the pipeline and the aggregation happening in supporting tooling.
For customers evaluating encoder changes (e.g., x265 to SVT-HEVC, x264 medium to fast), the same shape applies: pipeline encodes the candidate set, pipeline measures quality, external tooling computes BD-rate from the structured output. We provide the tooling and recommend interpretation.
The strict-broker security model handles the encoding and measurement portions the same as any analysis — workers receive content via short-lived presigned URLs, encode, compute metrics, emit results.
The general guidance: BD-rate is the standard for codec comparison; understand the methodology; use it for decisions but combine with operational considerations (encoding time, hardware compatibility, etc.). Don't rely on BD-rate alone for production choices; do rely on it for codec/encoder efficiency comparisons.