AAC — Advanced Audio Coding — is the audio codec that every video streaming pipeline emits as default audio. It's been the audio companion to H.264 since the early 2000s, the audio companion to HEVC since 2013, and unless you have a specific reason (live conferencing, podcasts, voice-only content), AAC is the audio codec your video pipeline is using. This page is the engineering reference for what AAC is, which AAC variant to pick, and how to encode it well.
What AAC is
AAC is a lossy perceptual audio codec standardized as MPEG-4 Part 3, with multiple object types corresponding to different feature sets and bitrate operating ranges. The variants you actually encounter in a video pipeline:
- AAC-LC (Low Complexity) — the default. The "AAC" everybody means when they say AAC. Standardized in MPEG-2 (1997), unchanged in MPEG-4. Covers ~64 kbps stereo through ~256 kbps stereo with good quality, ~320 kbps stereo at high quality.
- HE-AAC (High Efficiency, also AAC+) — AAC-LC plus Spectral Band Replication (SBR). Reproduces high frequencies parametrically rather than encoding them directly. Quality at low bitrates (~24-64 kbps stereo) is dramatically better than plain AAC-LC.
- HE-AAC v2 (also AAC+ v2) — HE-AAC plus Parametric Stereo (PS). Encodes the stereo image parametrically rather than as separate channels. Quality at very low bitrates (~16-32 kbps stereo) is decent.
- xHE-AAC (Extended HE-AAC) — adds Unified Speech and Audio Coding (USAC) tools and dynamic range control (DRC). Excellent at very low bitrates (12 kbps stereo+) and very high bitrates (~256 kbps stereo). The codec to use when you want one stream to cover broad bitrate and content variability.
For most video streaming pipelines, AAC-LC at 128-192 kbps stereo is the operating point you ship. That's the de facto default for premium video streaming, and changing it requires a specific reason.
FDK-AAC vs ffmpeg native
There are two AAC encoders worth knowing about for production pipelines:
FDK-AAC (Fraunhofer)
The reference encoder, developed by Fraunhofer Institute. Quality at the same bitrate is meaningfully better than ffmpeg's native AAC encoder — particularly at lower bitrates (sub-128 kbps) where psychoacoustic modeling matters most. FDK-AAC is the encoder major streaming services use.
The catch: FDK-AAC is licensed under a non-redistributable license — Fraunhofer permits use under specific terms but doesn't allow redistribution as part of GPL projects. Most ffmpeg distributions do NOT include FDK-AAC by default. You compile ffmpeg with --enable-libfdk-aac against your own checkout to get it. Pipelines that need FDK-AAC typically self-build ffmpeg or use a commercial ffmpeg distribution.
CLI invocation:
ffmpeg -i input.mp4 -c:v copy -c:a libfdk_aac -b:a 192k output.mp4
VBR mode is generally better than CBR for audio:
-c:a libfdk_aac -vbr 4
VBR levels 1-5 map to roughly 64, 96, 128, 160, 192 kbps stereo on average. Level 4 is the production default for premium streaming.
ffmpeg native AAC encoder (aac)
The default AAC encoder in ffmpeg builds. Quality is acceptable at 128 kbps+ stereo but noticeably worse than FDK at lower bitrates. For pipelines that can't use FDK for licensing reasons, this is the fallback.
CLI:
ffmpeg -i input.mp4 -c:v copy -c:a aac -b:a 192k output.mp4
The ffmpeg native AAC encoder has been improving — the 2024-2025 versions of ffmpeg are competitive with FDK at 192 kbps+ for most content. If you're shipping at 192 kbps stereo and your pipeline isn't licensed for FDK, the native encoder is fine. At 128 kbps and below, FDK is meaningfully better.
Bitrate sweet spots
For stereo content (the common case for video):
- 64 kbps — HE-AAC v2 territory. Acceptable for low-bandwidth streaming or speech-heavy content.
- 96 kbps — HE-AAC (without v2). Decent for general content, audible artifacts on music.
- 128 kbps — AAC-LC. The "transparent enough" point for streaming. The default for most non-premium tiers.
- 160-192 kbps — AAC-LC. The premium streaming default. Audible difference vs 128 kbps on critical listening with good headphones; rarely audible on consumer playback.
- 256 kbps — AAC-LC. Effectively transparent for stereo on consumer hardware. High enough that the codec is rarely the perceptual bottleneck.
- 320 kbps+ — diminishing returns. Most listeners can't tell 256 from 320 in blind tests on consumer playback.
For 5.1 surround (less common in streaming, more in download/Blu-ray):
- 384 kbps — AAC-LC 5.1. Acceptable.
- 640 kbps — AAC-LC 5.1. Premium 5.1.
For 7.1 and Atmos: the AAC variants used vary by platform. Apple's specific constraints differ from Android's. Atmos delivery typically uses E-AC-3 (Dolby Digital Plus) rather than AAC for surround, even when stereo masters are AAC.
Multi-channel encoding
Most video pipelines deliver stereo audio. Multi-channel content has a few additional considerations:
- Channel order matters — AAC's channel ordering is specified (FL, FR, FC, LFE, BL, BR for 5.1) but ffmpeg sometimes mis-detects channel layout from input files. Use
-channel_layout 5.1explicitly if you don't trust auto-detection. - Downmixing — for ABR ladders that include both stereo and 5.1, the stereo version is usually a downmix of the 5.1 master. ffmpeg's
panfilter handles this; the LtRt vs LoRo downmix choice matters if you care about Pro Logic II decoding. - Stereo + 5.1 co-encode — you encode both as separate audio tracks in the output container. HLS and DASH manifests reference them as alternate tracks; players choose based on output capabilities.
Container compatibility
AAC ships in essentially every video container that matters:
- MP4 / MOV — native. AAC-in-MP4 is the standard. Use the
mov,mp4,m4amuxer. - MPEG-TS — used for HLS legacy streams. AAC-in-TS uses the ADTS framing format.
- fragmented MP4 (fMP4) — used for HLS modern (LL-HLS, CMAF) and DASH. Same AAC stream, different framing.
- MKV / WebM — supported. Less common because WebM ecosystems usually pair with Opus, not AAC.
For HLS specifically: AAC-in-TS for legacy HLS, AAC-in-fMP4 for HLS that supports CMAF (HLS spec post-2019). Most modern HLS implementations are CMAF-capable.
When NOT to use AAC
- Voice-only / podcast — Opus at 24-32 kbps mono beats AAC at the same bitrate by a noticeable margin for speech.
- WebRTC live — Opus is mandatory; AAC support across WebRTC implementations is patchy.
- Music streaming — Spotify, Apple Music, etc. use codec choices specific to their pipeline (Spotify uses Vorbis and AAC; Apple Music uses AAC and ALAC). For pure audio products, the codec choice has different drivers than video pipelines.
For video streaming specifically, AAC is the right answer 95%+ of the time.
A note on AAC's licensing
AAC has patent licensing — the AAC patent pool, administered by Via Licensing — but at this point it's mature, well-understood, and priced into industry economics. Per-stream costs for streaming services exist but are small relative to video royalty costs. The major patent holders are companies that also license MPEG video codecs, so most large streamers have bundled licenses covering both.
For internal/non-commercial use, the pragmatic situation matches H.264: nobody's coming after development setups. For commercial deployment at scale, you license. The Via Licensing AAC license covers AAC-LC, HE-AAC, HE-AAC v2, and xHE-AAC under one agreement, which simplifies the negotiation versus the multi-pool video codec situation.
The licensing detail that occasionally trips engineers: FDK-AAC's source-code license is separate from the AAC patent license. Using FDK-AAC source code in your own product requires Fraunhofer's licensing terms; using the AAC codec format (any encoder) for streaming requires the Via Licensing agreement. Two separate things, often conflated.
AAC encoding gotchas
Audio encoding has fewer edge cases than video, but a few worth knowing:
- Sample rate matching — most video pipelines target 48 kHz or 44.1 kHz. AAC supports both natively. Pipelines that hand off content through different stages sometimes accidentally resample (48 → 44.1 → 48) and lose quality. Pin the sample rate end-to-end.
- Loudness normalization — separate concern from codec choice. EBU R128 and ATSC A/85 are the relevant standards. Encoders don't apply normalization automatically; your pipeline needs to do it before encoding.
- Pre-emphasis flags — legacy field in AAC headers. Some hardware decoders behave incorrectly when the flag is set. Keep it off (the default in modern encoders).
- PCE (Program Config Element) — for non-standard channel configurations. If you encounter "channel layout not supported by decoder" issues, PCE is usually the culprit. Most consumer decoders only handle the canonical layouts.
What MpegFlow does with AAC
MpegFlow's FfmpegExecutor worker image includes both FDK-AAC and FFmpeg's native AAC; the workflow YAML selects per-rendition encoder. Default audio encoding uses FDK-AAC where licensing permits and falls back to FFmpeg's native AAC when it doesn't. The DAG runtime expresses audio encoding as its own stage in the workflow with explicit dependency tracking; per-stage retry handles transient failures.
Bitrate defaults are 192 kbps stereo for premium tiers, 128 kbps stereo for standard tiers; xHE-AAC encoding is configurable for customers who want the broad-bitrate-coverage advantage. For surround content, the workflow YAML supports per-track audio configurations — typical setup is a stereo AAC track at 192 kbps as the default plus an optional 5.1 AAC track at 384 kbps. E-AC-3 / Dolby Atmos encoding is not currently a worker-image-supported encoder — Atmos workflows are operator-side work today (encode externally, mux into the MpegFlow output) rather than pipeline-native encode operations.
The audio side of a video pipeline is one of the parts most teams under-think because video tooling sucks up the engineering attention. Audio bugs in production are also disproportionately bad for user experience — viewers tolerate visual artifacts more than they tolerate audio artifacts. The AAC path stays under regression discipline alongside the video path because it matters operationally even if it's invisible most of the time.