MpegFlow with PostgreSQL: job state, audit log, multi-tenant tables

How MpegFlow uses PostgreSQL for job state, the append-only audit log, multi-tenant table layout, and the HA patterns that survive production traffic.

Stack integration · PostgreSQL·PostgreSQL ↗

PostgreSQL is MpegFlow's operational database — job state, customer metadata, pipeline definitions, and the append-only audit log all live here. We picked Postgres for the JSONB support (workflow DAGs are stored as JSONB), the HA story (managed Postgres on every cloud), and the operational maturity. This page covers the schema shape and the patterns that survive production scale.

How the integration works

Single Postgres database per MpegFlow deployment (or per dedicated-cluster customer for Enterprise tier). Schema includes: `jobs` (job state machine), `audit_events` (append-only event log), `workflows` (DAG definitions as JSONB), `customers` + `worker_pools` (multi-tenant config), `webhooks` (registered webhook receivers). All operational queries hit indexes; long-running analytical queries route to a read replica.

Common patterns

Managed Postgres for HA
AWS RDS Multi-AZ, Google Cloud SQL HA, or Aiven Postgres for managed deployments. Don't run stateful Postgres in K8s — the operational cost of running it well (backups, version upgrades, failover testing) outweighs the benefit. Pair MpegFlow's K8s deployment with managed Postgres.
Per-tenant table partitioning
For multi-tenant deployments, the `audit_events` table partitions by customer_id + month. Old partitions detach for archival or delete. This keeps query plans fast even at billions of audit rows across all tenants.
Read replica for analytics
BI / analytics queries (revenue per customer, throughput trends, etc.) route to a read replica. Production OLTP traffic stays on the primary. Replica lag is acceptable for analytics; never read job state from the replica during active processing.
JSONB for workflow definitions
Workflow DAGs are stored as JSONB. This gives schema flexibility (workflows evolve frequently) with indexable structured queries. GIN indexes on common JSONB paths (e.g., workflow type, output target type) keep queries fast.

Pitfalls

Long-running transactions hold locks on hot rows. Never run schema migrations or analytics queries against the primary during peak job throughput.
audit_events grows unboundedly. Partitioning + archival to S3 (via pg_cron + custom export) is the only sustainable path at scale.
JSONB queries can be slow without GIN indexes — common access patterns must be indexed explicitly. Profile queries on production-shape data.
Postgres HA failover takes 30-90 seconds. Workers should retry against a connection pool that handles failover transparently (PgBouncer in transaction-pool mode).
Connection limits matter: managed Postgres typically limits 100-500 connections. Use PgBouncer for application-side pooling so MpegFlow workers don't exhaust the limit.

At production scale

PostgreSQL at MpegFlow production scale handles 10K-50K writes/sec on properly-sized hardware (db.r6i.4xlarge equivalent or similar). Above that, partition workloads by customer or by job type. The audit log dominates write throughput; in extreme-scale deployments, it can move to a separate Postgres instance dedicated to event logging. Read traffic typically scales via replicas — write traffic forces vertical scaling.

Topics

postgresql
postgres
database
integration

MpegFlow with PostgreSQL: job state, audit log, multi-tenant tables

How MpegFlow uses PostgreSQL for job state, the append-only audit log, multi-tenant table layout, and the HA patterns that survive production traffic.

Stack integration · PostgreSQL·PostgreSQL ↗

How the integration works

Common patterns

Managed Postgres for HA
AWS RDS Multi-AZ, Google Cloud SQL HA, or Aiven Postgres for managed deployments. Don't run stateful Postgres in K8s — the operational cost of running it well (backups, version upgrades, failover testing) outweighs the benefit. Pair MpegFlow's K8s deployment with managed Postgres.
Per-tenant table partitioning
For multi-tenant deployments, the `audit_events` table partitions by customer_id + month. Old partitions detach for archival or delete. This keeps query plans fast even at billions of audit rows across all tenants.
Read replica for analytics
BI / analytics queries (revenue per customer, throughput trends, etc.) route to a read replica. Production OLTP traffic stays on the primary. Replica lag is acceptable for analytics; never read job state from the replica during active processing.
JSONB for workflow definitions
Workflow DAGs are stored as JSONB. This gives schema flexibility (workflows evolve frequently) with indexable structured queries. GIN indexes on common JSONB paths (e.g., workflow type, output target type) keep queries fast.

Pitfalls

Long-running transactions hold locks on hot rows. Never run schema migrations or analytics queries against the primary during peak job throughput.
audit_events grows unboundedly. Partitioning + archival to S3 (via pg_cron + custom export) is the only sustainable path at scale.
JSONB queries can be slow without GIN indexes — common access patterns must be indexed explicitly. Profile queries on production-shape data.
Postgres HA failover takes 30-90 seconds. Workers should retry against a connection pool that handles failover transparently (PgBouncer in transaction-pool mode).
Connection limits matter: managed Postgres typically limits 100-500 connections. Use PgBouncer for application-side pooling so MpegFlow workers don't exhaust the limit.

At production scale

Topics

postgresql
postgres
database
integration

MpegFlow with PostgreSQL: job state, audit log, multi-tenant tables

How the integration works

Common patterns

Managed Postgres for HA

Per-tenant table partitioning

Read replica for analytics

JSONB for workflow definitions

Pitfalls

At production scale

MpegFlow with PostgreSQL: job state, audit log, multi-tenant tables

How the integration works

Common patterns

Managed Postgres for HA

Per-tenant table partitioning

Read replica for analytics

JSONB for workflow definitions

Pitfalls

At production scale