MpegFlowBlogBack to home
← Stack integrations·observability

MpegFlow with Prometheus + Grafana: open-source observability

How MpegFlow integrates with Prometheus + Grafana — the open-source observability stack. Native OpenMetrics, recording rules, the dashboards that work, and when this beats Datadog.

Stack integration · Prometheus + Grafana·Prometheus + Grafana ↗

Prometheus + Grafana is the open-source observability stack — typically self-hosted or run via managed services like Grafana Cloud, Amazon Managed Prometheus, or Google Managed Service for Prometheus. MpegFlow exports metrics in OpenMetrics format (Prometheus-native), so the integration is direct: add scrape config, deploy a Grafana dashboard, you're done.

How the integration works

MpegFlow exposes /metrics on each coordinator + worker pod in OpenMetrics format. Prometheus scrapes via standard ServiceMonitor / PodMonitor (kube-prometheus-stack pattern). Grafana queries Prometheus + renders dashboards. The whole stack is self-hostable for sovereign-cloud requirements; or runs as a managed service for operational simplicity.

Common patterns

  • kube-prometheus-stack deployment

    The standard K8s pattern: install kube-prometheus-stack via Helm (one chart deploys Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics). Add a ServiceMonitor pointing at MpegFlow's metrics endpoints. Done. Capacity for ~10M metric samples per second per Prometheus node.

  • Recording rules for expensive queries

    Some queries (p99 over 7 days across all pools) are expensive to compute on every Grafana refresh. Use Prometheus recording rules to pre-compute them at 1-minute intervals. Dashboards query the recorded series instead of the raw histogram_quantile expression.

  • Long-term storage with Thanos / Mimir

    Prometheus retention is typically 15-90 days locally. For longer retention (compliance, capacity planning over years), pair with Thanos or Grafana Mimir for object-storage-backed long-term storage. A small additional layer; pays back for any workload needing trend analysis beyond the local retention window.

  • Sovereign-cloud / on-prem deployments

    For air-gapped or sovereign-cloud requirements, Prometheus + Grafana runs entirely in your perimeter. No outbound calls to a SaaS observability vendor. Pair with self-hosted MpegFlow + MinIO + on-prem Postgres for a fully-isolated stack.

Pitfalls

  1. Prometheus is single-write — high cardinality (per-job tags) blows up memory. Use recording rules to aggregate before storing; never tag metrics with high-cardinality job IDs.
  2. Grafana dashboard queries can be slow on large time ranges. Use $__interval and downsampling to keep dashboards responsive.
  3. Prometheus federation between clusters introduces lag and complexity. Most multi-cluster deployments use Thanos or Mimir instead of native federation.
  4. Long-term storage in Thanos/Mimir is operationally non-trivial. Object storage + sidecar pattern works but requires SRE attention.
  5. The Prometheus + Grafana stack is your responsibility to operate — backup, version upgrades, capacity planning. Managed Datadog removes that operational burden at higher cost.

At production scale

Prometheus at MpegFlow production scale handles ~100K-1M active series per node. Above 1M series, sharding via federation or Thanos becomes necessary. For sovereign-cloud deployments where Datadog isn't an option, Prometheus + Grafana + Thanos handles 10M+ minutes/month workloads with proper sharding. Operational cost: ~0.5 SRE-FTE for a production-grade Prometheus stack across multiple clusters.

Topics
  • prometheus
  • grafana
  • observability
  • metrics
  • integration
  • Self-hosted
Building this stack?

Talk to us about your specific shape.

The integration patterns above cover most production deployments. If your shape is different — sovereign-cloud, regulated workloads, or scale that needs custom routing — beta cohort design partners get founder-direct help with the integration.

Apply Other integrations
© 2026 MpegFlow, Inc. · Trust & complianceAll systems nominal·StatusPrivacy