Skip to main content

Monitoring and Metrics

Out of the box, Flink jobs that run on Ververica Platform: Self-Managed expose metrics by using:

  • JMX (Java Management Extensions)
  • Prometheus (HTTP endpoint scraping)

This page describes what is already configured in your deployment so you can plug the data into your own monitoring stack (for example, Prometheus + Grafana).
Setting up or operating Prometheus / Grafana itself is outside the scope of this documentation and remains entirely under your control.

What’s Pre-configured?

1. Pod-level Prometheus Annotations

Every Flink pod (JobManager and TaskManager) includes annotations that instruct a Prometheus scraper to collect metrics automatically:

annotations:
prometheus.io/path: /metrics
prometheus.io/port: "9999"
prometheus.io/scrape: "true"
  • prometheus.io/path: The HTTP path where metrics are exposed (/metrics).
  • prometheus.io/port: The container port (9999) where the metrics endpoint listens.
  • prometheus.io/scrape: Indicates that the pod should be scraped (true).
tip

If you already run a Prometheus operator in the same cluster, it can discover these pods automatically based on the annotations.

The following metric reporters are enabled by default in the Flink cluster configuration:

metrics.reporters: jmx:promappmgr

# JMX Reporter
metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.port: 10000-10240 # Port range for JMX

# Prometheus Reporter
metrics.reporter.promappmgr.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
ReporterPurposeWhere It Listens
JMXFor JVM-based monitoring tools or exporters.Ports 10000–10240 on each pod.
PrometheusExposes human-readable metrics on the HTTP endpoint defined by the pod annotations.Port 9999 (/metrics).

Next Steps

  1. Scrape the Metrics

    • Point your in-cluster Prometheus deployment at the Kubernetes namespace (or use ServiceMonitor objects) so it detects pods with the prometheus.io/scrape: "true" annotation. For more details, visit the official Prometheus documentation website.
  2. Visualize in Grafana

    • Build your own using the Prometheus data source.
  3. Define Alerts

    • Define alert rules in Prometheus or Grafana Alerting to monitor job health (e.g., restart count, checkpoint failures, backpressure).

No additional configuration inside Ververica Platform: Self-Managed is required. All metrics are emitted automatically once the Flink cluster starts.