MpegFlowBlogBack to home
← Architectures·Multi-tenant security architecture

Multi-tenant security — the strict-broker pattern

How MpegFlow keeps tenant data isolated when workers run customer FFmpeg commands. Strict-broker model, presigned URLs, no credentials on workers, HMAC-signed webhooks.

ByMpegFlow Engineering Team·For staff engineers and security architects evaluating MpegFlow for shared-tenant deployments
·Multi-tenant security architecture·10 min read·2,048 words·May 5, 2026
In this architecture
  1. Use case in scope
  2. The threat model
  3. How the strict-broker works
  4. Step 1: Worker boots with **no** credentials
  5. Step 2: Coordinator generates per-job presigned URLs
  6. Step 3: Worker's network is locked down
  7. Step 4: FFmpeg runs under sandboxing
  8. Strict-broker vs the alternatives
  9. Pattern: workers have IAM roles ("trusted workers")
  10. Pattern: per-tenant clusters ("dedicated tenancy")
  11. Pattern: VM-level isolation per job
  12. Pattern: **Strict-broker (MpegFlow's choice)**
  13. Webhook security
  14. Audit trail — what's recorded per job
  15. Defense-in-depth checklist
  16. Defense-in-depth — what this architecture mitigates and where the next layer lives
  17. How to evaluate this architecture for your team

A worker pod is running customer FFmpeg commands. The customer's input file might be a hand-crafted MP4 designed to exploit a libavformat bug. If that worker has any credential — a database password, an S3 key, a service-mesh certificate — a successful exploit means the attacker can read other tenants' data.

The standard response is "harden the worker." Better libavformat patches, better seccomp, better sandboxing. All worth doing.

The MpegFlow response is more aggressive: the worker has no credentials at all. Not stripped-down credentials, not least-privileged credentials. Zero. The strict-broker pattern isn't "the worker has slightly fewer secrets than before"; it's the worker is structurally incapable of accessing anything beyond its current job's inputs and outputs, because it doesn't know how.

This document covers the multi-tenant security architecture in production.

#Use case in scope

Any deployment where:

  • More than one customer / tenant / org runs jobs on the same physical fleet
  • Customer-supplied input files run through FFmpeg (which has had real CVEs over the years — CVE-2023-49502, CVE-2024-7272, etc.)
  • A breach of tenant isolation would be material — contractual penalty, reputational damage, regulatory exposure

This is most video infrastructure. Even single-tenant SaaS (your customers each get a dedicated cluster) often runs co-tenant pipelines for cost reasons.

#The threat model

We model an adversary who:

  1. Submits jobs as a legitimate customer (has valid API credentials for their organization)
  2. Crafts input files designed to exploit FFmpeg vulnerabilities — buffer overflow, integer underflow, malformed container parsing, etc.
  3. Achieves arbitrary code execution within a worker process
  4. Wants to: read other tenants' files, modify other tenants' jobs, exfiltrate credentials, persist on the host, pivot laterally

The defenses are layered:

graph TB
    A["Adversary submits<br/>malicious input"]
    A --> B["Worker fetches input<br/>via presigned URL only"]
    B --> C{"Input parses<br/>cleanly?"}
    C -->|"No"| F["FFmpeg fails fast<br/>(no exec)"]
    C -->|"Yes (or compromised)"| D["FFmpeg runs<br/>under seccomp +<br/>resource limits"]
    D --> E{"Compromised?"}
    E -->|"No"| OK1["Job completes<br/>normally"]
    E -->|"Yes — RCE"| G["Worker has<br/>no credentials"]
    G --> H{"Try to access<br/>other tenant data?"}
    H -->|"Network: blocked"| BLOCK1["No path to DB"]
    H -->|"S3 keys: none"| BLOCK2["No credentials<br/>to assume"]
    H -->|"Lateral: blocked"| BLOCK3["No service-mesh<br/>identity"]
    BLOCK1 --> CONTAIN["Worker process killed<br/>by health-check timeout"]
    BLOCK2 --> CONTAIN
    BLOCK3 --> CONTAIN

The critical claim: even with arbitrary code execution inside the worker, the attacker cannot read another tenant's data. Defense-in-depth, but the structural defense is the credential vacuum.

#How the strict-broker works

#Step 1: Worker boots with **no** credentials

Every other process in your stack has something. A database password. An S3 IAM role. A service mesh cert. A vault token.

A MpegFlow worker has:

  • A worker authentication token (shared secret, scoped only to "this is a real worker") — used to authenticate to the gRPC coordinator. Cannot read or write anything else.
  • A container image hash (so you can verify what version it ran)
  • That's it. No DB credentials. No S3 IAM role. No service-mesh identity.

#Step 2: Coordinator generates per-job presigned URLs

When the coordinator assigns a job to a worker, it generates short-lived presigned URLs for only the assets that job needs:

  • Input URLs: presigned GET URLs for each input asset. TTL: 1 hour. Single-use semantics not enforceable in S3, but timing makes replay attacks meaningless.
  • Output URLs: presigned PUT URLs for each output the job will produce. TTL: 1 hour.

The presigned URLs encode the exact resource path. They are not credentials in the lateral-movement sense — even if exfiltrated, they only authorize access to this specific asset, for this short window.

sequenceDiagram
    participant Coord as Coordinator<br/>(has S3 signing key)
    participant Worker as Worker<br/>(no credentials)
    participant S3 as S3 / MinIO

    Worker->>Coord: gRPC GetWorkAssignment(worker_token)
    Coord->>Coord: Authenticate worker
    Coord->>Coord: Pick a pending job for this pool
    Coord->>S3: GeneratePresignedURL(input.mp4, GET, 1h)
    Coord->>S3: GeneratePresignedURL(output.mp4, PUT, 1h)
    Coord-->>Worker: JobAssignment {<br/>input_urls: [...],<br/>output_urls: [...],<br/>ffmpeg_cmd: ...<br/>}

    Worker->>S3: HTTP GET input.mp4 (presigned URL only)
    S3-->>Worker: input bytes
    Worker->>Worker: Run FFmpeg (sandboxed)
    Worker->>S3: HTTP PUT output.mp4 (presigned URL only)
    S3-->>Worker: 200 OK
    Worker->>Coord: gRPC ReportJobStatus(completed)

#Step 3: Worker's network is locked down

Workers can reach:

  • The gRPC coordinator endpoint (mTLS optional, worker token required)
  • The presigned-URL host (S3, MinIO) — over public TLS, no IAM headers from the worker

Workers cannot reach:

  • The Postgres database directly
  • The Redis instance directly
  • Other tenants' worker pods (NetworkPolicy denies pod-to-pod traffic)
  • The host's metadata service (IMDSv2 blocked at the pod-network level — no AWS role fishing)
  • Any internal service (no service-mesh sidecar with internal cert)

This is enforced at the Kubernetes NetworkPolicy and node-level firewall, not at the application level. A compromised worker can't bypass it.

#Step 4: FFmpeg runs under sandboxing

Inside the worker container:

  • Read-only root filesystem — write access only to the working directory
  • Memory cap — kills jobs that balloon (often the symptom of a malformed input)
  • CPU cgroup — limits one job's CPU steal to other jobs in the pool
  • Seccomp profile — restricts syscalls to the set FFmpeg actually needs
  • Drop all capabilities — no CAP_NET_ADMIN, no CAP_SYS_PTRACE, etc.
  • Non-root UID — worker process and FFmpeg both run as a non-root user

These don't prevent code execution within the seccomp-allowed surface, but they prevent escape into the broader host context.

#Strict-broker vs the alternatives

The pattern is unusual in video infrastructure. More common alternatives:

#Pattern: workers have IAM roles ("trusted workers")

The default for most teams. Worker pods run with an IAM role granting S3 read/write to the input/output buckets. Simple, fast, the obvious thing.

Problem: the IAM role grants access to the bucket, not the specific object. A compromised worker reads any tenant's data in the bucket. Mitigation requires per-tenant bucket separation, which adds operational complexity (and most teams don't actually do it).

#Pattern: per-tenant clusters ("dedicated tenancy")

Each customer gets their own cluster. No shared workers, no co-tenancy. Strongest isolation.

Problem: expensive. A 100-customer SaaS becomes 100 clusters, each underutilized. Most providers don't offer this except at very high tiers.

#Pattern: VM-level isolation per job

Each FFmpeg invocation runs in a fresh VM (Firecracker, Kata Containers). Hardware-level isolation between jobs.

Problem: cold-start latency. VMs take 100ms-2s to provision; for short jobs that doubles total latency.

#Pattern: **Strict-broker (MpegFlow's choice)**

Workers run shared, but they have no credentials. Compromise of the worker process can't reach beyond the current job.

Trade-off: every file transfer goes through presigned URL generation (one extra round-trip per asset). Adds ~10-50ms per job in coordinator load. For broadcast workloads where jobs run minutes-to-hours, this is invisible.

#Webhook security

Outbound webhooks from MpegFlow to your application are HMAC-SHA256 signed. The signature header (X-MpegFlow-Signature) is verifiable using the webhook secret you configured at webhook creation time.

sequenceDiagram
    participant Bus as EventBus
    participant WH as WebhookService
    participant Exec as Webhook Executor
    participant App as Your Application

    Bus->>WH: emit(JobCompleted)
    WH->>WH: Build payload (resource snapshot)
    WH->>WH: Sign with HMAC-SHA256(secret, body)
    WH->>Exec: Queue webhook delivery

    Exec->>App: POST /your/webhook<br/>Headers:<br/>  X-MpegFlow-Signature: hmac=...<br/>  X-MpegFlow-Timestamp: ...<br/>  X-MpegFlow-Event: job.completed
    App->>App: Verify signature<br/>(reject if invalid)
    App-->>Exec: 200 OK

    alt Failure
        Exec->>Exec: Backoff (1m → 5m → 30m)
        alt 10+ consecutive failures
            Exec->>WH: Disable webhook<br/>(circuit breaker)
        end
    end

Notes:

  • The signature includes a timestamp to prevent replay attacks. Reject any delivery older than 5 minutes.
  • Failed deliveries retry with exponential backoff. After 10 consecutive failures, the webhook is automatically disabled (circuit breaker) — your application stays healthy even when your webhook handler is broken.
  • Failed deliveries persist in webhook_deliveries table; you can replay them via API.

#Audit trail — what's recorded per job

The strict-broker pattern is only useful if you can verify it worked. Every job records:

Field Why it matters
job.id, job.org_id, job.workflow_id Tenant isolation auditability
Worker assignment timestamp + worker_id Which physical worker ran this job
FFmpeg command (full) Exact reproducibility — can replay any job
FFmpeg version + container hash Know which binary actually ran
Input asset hashes (SHA-256) Confirm exactly which file was processed
Output asset hashes Confirm exactly what was produced
Stage-by-stage timestamps Validation, encode, package each tracked
Exit code + stderr (full, compressed) Debug failures + flag suspicious patterns
Retry history (if any) Each attempt's worker, params, outcome

These records are append-only in PostgreSQL (audit_logs table), backed up nightly, retained per your contractual requirements (often 7+ years for broadcast).

For regulated industries, this trail satisfies most "we need to know exactly what happened to this content" questions without additional tooling.

#Defense-in-depth checklist

When deploying for a multi-tenant production environment, verify:

Layer Check
Worker credentials Worker pod IAM role: empty. Service account: bound to worker-only Role with zero access to tenant data.
Network Kubernetes NetworkPolicy denies pod-to-pod between tenants. Egress allowlist: only coordinator gRPC + S3 endpoints. IMDS blocked.
Container Read-only rootfs. Drop all capabilities. Run as non-root. Seccomp profile restricting syscalls to FFmpeg-needed set. Memory + CPU cgroups.
Image Worker image scanned on every deploy (Trivy, Snyk, etc.). Signed via cosign or equivalent. Pinned to specific FFmpeg version.
Coordinator Presigned URL TTL ≤ 1 hour. URL generation logs every request. Signing key rotation cadence documented.
Webhooks HMAC-SHA256 signing enforced on every delivery. Timestamp window ≤ 5 minutes. Circuit breaker disabling failed endpoints.
Audit Full provenance per job (encoder version, params, input/output hashes). Append-only table. Backups. Retention matched to contracts.
Observability Worker process resource exhaustion alerts. Suspicious-syscall alerts (auditd or eBPF). Coordinator authentication failure rate.

#Defense-in-depth — what this architecture mitigates and where the next layer lives

Strict-broker is one layer of a multi-layer defense. Top-tier security teams expect to see how the whole stack composes; here's how.

Concern Strict-broker contribution Where the next defense layer is
Worker compromise via FFmpeg exploit Worker has no credentials → blast radius capped at the current job Container hardening (seccomp, drop capabilities, read-only rootfs) — applied today
Coordinator compromise Coordinator is the trust anchor — runs in a hardened tier with strict ingress controls Coordinator-tier hardening: limited egress, mTLS, per-deploy audit, separated infrastructure-as-code
PostgreSQL compromise Audit log + metadata co-located by design (operational simplicity) Encryption at rest with customer-managed KMS, access logging, periodic rotation. For highest-threat workloads: per-tenant database isolation (Enterprise tier)
Side channels (Spectre/Meltdown class) Multi-tenant workers run on shared nodes; structurally vulnerable to CPU-level side channels For high-threat workloads, request dedicated nodes per tier — supported in Enterprise plans
Denial of service Per-pool resource caps prevent a single tenant from exhausting shared workers Application-level rate limiting + quotas (Free/Starter/Pro plan tiers); CDN-level DDoS protection at the perimeter
Insider threats All admin actions logged to append-only audit table SOC 2 Type II controls (least privilege, periodic access review, separation of duties). Audit window opens 2026 Q4

Each row reads as: "MpegFlow's structural choice + the standard production hardening that goes with it." For procurement security questionnaires, this is the framing your security review will use anyway — we're just being explicit about it up front.

#How to evaluate this architecture for your team

If you're at a team where multi-tenant security is a procurement gate:

  1. Ask vendors directly: "Do your workers have IAM roles?" If yes, ask how they isolate at the bucket level.
  2. Ask: "What does the worker know after a successful exploit?" If the answer is anything more than "the URL of the input file currently being processed," you have lateral-movement exposure.
  3. Ask for the audit trail per job. The depth of provenance tells you how seriously they took multi-tenancy.
  4. Run the defense-in-depth checklist above against any platform you evaluate. Most fail at least 3 of those rows.

If our approach matches what your security team needs, apply to the design partner program — multi-tenant security is one of the conversations we have most often with broadcast and OTT operators.

Topics
  • reference architecture
  • Security
  • multi-tenancy
  • strict-broker
  • presigned-urls
See also

Related architectures and reading

  • Architecture
    DRM packaging pipeline
    Widevine, FairPlay, PlayReady via SPEKE — the protected-content path
  • Architecture
    Live ingest + low-latency packaging
    Production architecture for live broadcast video
  • Architecture
    Cost-aware spot-instance pool
    Spot economics, interruption handling, the cost math
Want to deploy this?

Apply to the design partner cohort.

We work directly with engineering teams deploying architectures like this one — free during beta, founder-direct, real influence on the roadmap.

Apply Other architectures
© 2026 MpegFlow, Inc. · Trust & complianceAll systems nominal·StatusPrivacy