MpegFlowBlogBack to home
← Stack integrations·database

MpegFlow with PostgreSQL: job state, audit log, multi-tenant tables

How MpegFlow uses PostgreSQL for job state, the append-only audit log, multi-tenant table layout, and the HA patterns that survive production traffic.

Stack integration · PostgreSQL·PostgreSQL ↗

PostgreSQL is MpegFlow's operational database — job state, customer metadata, pipeline definitions, and the append-only audit log all live here. We picked Postgres for the JSONB support (workflow DAGs are stored as JSONB), the HA story (managed Postgres on every cloud), and the operational maturity. This page covers the schema shape and the patterns that survive production scale.

How the integration works

Single Postgres database per MpegFlow deployment (or per dedicated-cluster customer for Enterprise tier). Schema includes: `jobs` (job state machine), `audit_events` (append-only event log), `workflows` (DAG definitions as JSONB), `customers` + `worker_pools` (multi-tenant config), `webhooks` (registered webhook receivers). All operational queries hit indexes; long-running analytical queries route to a read replica.

Common patterns

  • Managed Postgres for HA

    AWS RDS Multi-AZ, Google Cloud SQL HA, or Aiven Postgres for managed deployments. Don't run stateful Postgres in K8s — the operational cost of running it well (backups, version upgrades, failover testing) outweighs the benefit. Pair MpegFlow's K8s deployment with managed Postgres.

  • Per-tenant table partitioning

    For multi-tenant deployments, the `audit_events` table partitions by customer_id + month. Old partitions detach for archival or delete. This keeps query plans fast even at billions of audit rows across all tenants.

  • Read replica for analytics

    BI / analytics queries (revenue per customer, throughput trends, etc.) route to a read replica. Production OLTP traffic stays on the primary. Replica lag is acceptable for analytics; never read job state from the replica during active processing.

  • JSONB for workflow definitions

    Workflow DAGs are stored as JSONB. This gives schema flexibility (workflows evolve frequently) with indexable structured queries. GIN indexes on common JSONB paths (e.g., workflow type, output target type) keep queries fast.

Pitfalls

  1. Long-running transactions hold locks on hot rows. Never run schema migrations or analytics queries against the primary during peak job throughput.
  2. audit_events grows unboundedly. Partitioning + archival to S3 (via pg_cron + custom export) is the only sustainable path at scale.
  3. JSONB queries can be slow without GIN indexes — common access patterns must be indexed explicitly. Profile queries on production-shape data.
  4. Postgres HA failover takes 30-90 seconds. Workers should retry against a connection pool that handles failover transparently (PgBouncer in transaction-pool mode).
  5. Connection limits matter: managed Postgres typically limits 100-500 connections. Use PgBouncer for application-side pooling so MpegFlow workers don't exhaust the limit.

At production scale

PostgreSQL at MpegFlow production scale handles 10K-50K writes/sec on properly-sized hardware (db.r6i.4xlarge equivalent or similar). Above that, partition workloads by customer or by job type. The audit log dominates write throughput; in extreme-scale deployments, it can move to a separate Postgres instance dedicated to event logging. Read traffic typically scales via replicas — write traffic forces vertical scaling.

Topics
  • postgresql
  • postgres
  • database
  • integration
Building this stack?

Talk to us about your specific shape.

The integration patterns above cover most production deployments. If your shape is different — sovereign-cloud, regulated workloads, or scale that needs custom routing — beta cohort design partners get founder-direct help with the integration.

Apply Other integrations
© 2026 MpegFlow, Inc. · Trust & complianceAll systems nominal·StatusPrivacy