Skip to main content

Monitoring and Metrics

Out of the box, Flink jobs that run in a Bring-Your-Own-Cloud (BYOC) workspace expose metrics by using:

  • JMX (Java Management Extensions)
  • Prometheus (HTTP endpoint scraping)

This page describes what is already configured in your BYOC deployment so you can plug the data into your own monitoring stack (for example, Prometheus + Grafana).
Setting up or operating Prometheus / Grafana itself is outside the scope of this documentation and remains entirely under your control.

What’s Pre-configured?

1. Pod-level Prometheus Annotations

Every Flink pod (JobManager and TaskManager) includes annotations that instruct a Prometheus scraper to collect metrics automatically:

annotations:
prometheus.io/path: /metrics
prometheus.io/port: "9999"
prometheus.io/scrape: "true"
  • prometheus.io/path: The HTTP path where metrics are exposed (/metrics).
  • prometheus.io/port: The container port (9999) where the metrics endpoint listens.
  • prometheus.io/scrape: Indicates that the pod should be scraped (true).
tip

If you already run a Prometheus operator in the same cluster, it can discover these pods automatically based on the annotations.

The following metric reporters are enabled by default in the Flink cluster configuration shipped with BYOC:

metrics.reporters: jmx:promappmgr

# JMX Reporter
metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.port: 10000-10240 # Port range for JMX

# Prometheus Reporter
metrics.reporter.promappmgr.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
ReporterPurposeWhere It Listens
JMXFor JVM-based monitoring tools or exporters.Ports 10000–10240 on each pod.
PrometheusExposes human-readable metrics on the HTTP endpoint defined by the pod annotations.Port 9999 (/metrics).

Next Steps

  1. Scrape the Metrics

    • Point your in-cluster Prometheus deployment at the Kubernetes namespace (or use ServiceMonitor objects) so it detects pods with the prometheus.io/scrape: "true" annotation. For more details, visit the official Prometheus documentation website.
  2. Visualize in Grafana

    • Build your own using the Prometheus data source.
  3. Define Alerts

    • Define alert rules in Prometheus or Grafana Alerting to monitor job health (e.g., restart count, checkpoint failures, backpressure).

No additional configuration inside Ververica Cloud: Bring-Your-Own-Cloud is required. All metrics are emitted automatically once the Flink cluster starts.