Monitoring and Metrics
On this page
Out of the box, Flink jobs that run in a Bring-Your-Own-Cloud (BYOC) workspace expose metrics by using:
- JMX (Java Management Extensions)
- Prometheus (HTTP endpoint scraping)
This page describes what is already configured in your BYOC deployment so you can plug the data into your own monitoring stack (for example, Prometheus + Grafana).
Setting up or operating Prometheus / Grafana itself is outside the scope of this documentation and remains entirely under your control.
What’s Pre-configured?
1. Pod-level Prometheus Annotations
Every Flink pod (JobManager and TaskManager) includes annotations that instruct a Prometheus scraper to collect metrics automatically:
1annotations:
2 prometheus.io/path: /metrics
3 prometheus.io/port: "9999"
4 prometheus.io/scrape: "true"- prometheus.io/path: The HTTP path where metrics are exposed (
/metrics). - prometheus.io/port: The container port (
9999) where the metrics endpoint listens. - prometheus.io/scrape: Indicates that the pod should be scraped (
true).
If you already run a Prometheus operator in the same cluster, it can discover these pods automatically based on the annotations.
2. Baseline Flink Configuration
The following metric reporters are enabled by default in the Flink cluster configuration shipped with BYOC:
1metrics.reporters: jmx:promappmgr
2
3# JMX Reporter
4metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
5metrics.reporter.jmx.port: 10000-10240 # Port range for JMX
6
7# Prometheus Reporter
8metrics.reporter.promappmgr.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactoryNext Steps
- Scrape the Metrics
- Point your in-cluster Prometheus deployment at the Kubernetes namespace (or use
ServiceMonitorobjects) so it detects pods with theprometheus.io/scrape: "true"annotation. For more details, visit the official Prometheus documentation website.
- Point your in-cluster Prometheus deployment at the Kubernetes namespace (or use
- Visualize in Grafana
- Build your own using the Prometheus data source.
- Define Alerts
- Define alert rules in Prometheus or Grafana Alerting to monitor job health (e.g., restart count, checkpoint failures, backpressure).
No additional configuration inside Ververica Cloud: Bring-Your-Own-Cloud is required. All metrics are emitted automatically once the Flink cluster starts.
Reference Links
- Apache Flink: Metrics
- Prometheus: Scrape Classes