Deploying Fluss on Kubernetes
On this page
- Overview
- Prerequisites
- Registry Credentials
- Create the Image Pull Secret
- Verify Registry Access
- Verify Helm OCI access
- Verify Docker image pull from the cluster
- Choose a Version
- Install
- Upgrade
- Verify the Deployment
- Managing Multiple Fluss Clusters
- Connecting to Fluss
- Production Hardening
- Persistent Storage
- Resource Requests and Limits
- Replication and Bucketing
- Service Account
- Troubleshooting
- Pods not starting
- Pod stuck in ImagePullBackOf or ErrImagePull
- Image pull secret not picked up by the pod
- Bundled Zookeeper Pods Fail to Pull
- Complete values.yaml example
- Further Reading
- Related manuals:
This document guides developers through deploying and hardening a Fluss cluster on Kubernetes using Helm. It covers installation workflows, multi-cluster management, client connectivity, and production optimization strategies.
Overview
The fluss-bundle Helm chart deploys a complete Fluss cluster on Kubernetes. The chart bundles Fluss with ZooKeeper (Bitnami, with the image mirrored to the Ververica registry) and exposes the Fluss client endpoint on port 9124.
The default deployment topology includes the following components:
The Ververica Platform registry at registry.ververica.cloud hosts all images for both Fluss and ZooKeeper. A single image pull secret covers both of these components.
This manual covers a minimal deployment and common production hardening options. You can find information about remote storage, SASL authentication, and Prometheus monitoring in separate manuals.
Prerequisites
- Kubernetes 1.24+ cluster with kubectl configured and pointing to the target cluster
- Helm 3.8+ (OCI chart support required)
- Access to the Ververica registry (registry.ververica.cloud)
- Sufficient cluster capacity for the default pod count (7 pods) plus your configured resource requests
Registry Credentials
To obtain credentials, view available artifact endpoints, and find registry login commands for both Helm OCI and Docker, see Obtaining Registry Access. The remainder of this manual assumes you have exported REGISTRY_USERNAME and REGISTRY_PASSWORD in your shell.
Create the Image Pull Secret
Create a Kubernetes image pull secret in your target namespace before installing the chart. Both the Fluss and ZooKeeper pods use this secret:
1kubectl create namespace fluss
2kubectl create secret docker-registry ververica-registry \
3 --docker-server=registry.ververica.cloud \
4 --docker-username="$REGISTRY_USERNAME" \
5 --docker-password="$REGISTRY_PASSWORD" \
6 --namespace flussVerify Registry Access
Before installing the chart, confirm that both the Helm OCI registry and the Kubernetes image pull path work end-to-end. See Obtaining Registry Access.docx for the canonical artifact paths.
Verify Helm OCI access
Pull the chart locally without installing it by running the following command:
1helm pull oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2 --version 0.9.1-vv-2 \
3 --destination /tmpA fluss-bundle-0.9.1-vv-2.tgz file should appear under /tmp. Authentication failures surface as 401 Unauthorized or 403 Forbidden. If you encounter these errors, see Troubleshooting.
Verify Docker image pull from the cluster
Run a one-shot pod in your Fluss namespace that uses the pull secret to fetch the Fluss image:
1kubectl -n fluss run registry-check \
2 --rm -it --restart=Never \
3 --image=registry.ververica.cloud/platform-images/fluss:0.9.1-vv-2 \
4 --overrides='{"spec":{"imagePullSecrets":[{"name":"ververica-registry"}]}}' \
5 -- /bin/sh -c "echo image pulled successfully"If the pod prints that the image pulled successfully and exits cleanly, you have wired up the secret correctly. A pod stuck in ImagePullBackOff indicates an authentication or naming problem. If you encounter this issue, see Troubleshooting.
Install the Chart
Choose a Version
The version-discovery workflow depends on the production distribution pipeline, which the Ververica Platform team has not yet finalized. Once the pipeline is established, this section will document how to discover available chart and image versions.
Create a values.yaml file that references the pull secret. You must include the zookeeper.image.pullSecrets entry because the Ververica Platform registry also serves the bundled ZooKeeper image. The fluss.image.tag defaults to the image version shipped with this chart release and should not be overridden:
1fluss:
2 image:
3 pullSecrets:
4 - ververica-registry
5zookeeper:
6 image:
7 pullSecrets:
8 - ververica-registryFor the full set of configurable fields under fluss:, refer to the Fluss Helm Chart documentation.
Install
1helm install fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2 --version 0.9.1-vv-2 \
3 --namespace fluss \
4 -f values.yamlThe production distribution pipeline is not yet finalized, so the exact chart registry path might change.
Upgrade
When you change a value in values.yaml and want to apply it without bumping the chart version, run the following command:
1helm upgrade fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2 --version 0.9.1-vv-2 \
3 --namespace fluss \
4 -f values.yamlTo upgrade to a new chart release, point to the new version by running the following command:
1helm upgrade fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2 --version 0.9.1-vv-2 \
3 --namespace fluss \
4 -f values.yamlCluster availability during upgrade
The helm upgrade command triggers a rolling restart of the Fluss and ZooKeeper StatefulSets. The single coordinator pod has a brief outage window during its restart, though clients reconnect within seconds. The tablet server replicas restart 1 at a time. Each restart causes a leader failover for the partitions that the pod leads, which produces short read and write latency spikes. You should plan upgrades during a maintenance window if your workload is sensitive to these effects.
Verify the Deployment
Verify that all pods reach the Running state by running the following command:
1kubectl get pods -n flussExpected pods include the following:
List the services to identify the client endpoint by running the following command:
1kubectl get services -n flussThe coordinator server service exposes the Fluss client port (9124).
Managing Multiple Fluss Clusters
Each Helm release in a separate Kubernetes namespace acts as an independent Fluss cluster. No state is shared between namespaces.
To run a second cluster alongside the first, execute the following command:
1kubectl create namespace fluss-stagingCreate the image pull secret in the new namespace by following the same steps as Create the Image Pull Secret, substituting --namespace fluss-staging.
1helm install fluss-staging oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2 --version 0.9.1-vv-2> \
3 --namespace fluss-staging \
4 -f values-staging.yamlManage each cluster independently with helm upgrade or helm uninstall commands scoped to its specific namespace.
Fluss service names do not include the Helm release name. Two clusters in the same namespace would have colliding service names, so you must use separate namespaces.
Connecting to Fluss
Fluss clients connect using bootstrap.servers pointing to the coordinator pod through its headless service on port 9124. The Fluss chart uses fixed service names without a Helm release-name prefix, so the bootstrap address is namespace-scoped:
1coordinator-server-0.coordinator-server-hs.<namespace>.svc.cluster.local:9124For a cluster deployed in the fluss namespace, use the following address:
1coordinator-server-0.coordinator-server-hs.fluss.svc.cluster.local:9124The coordinator and tablet pods have readiness and liveness probes configured, so a pod in the Running state with ready containers has already passed the port check. To verify connectivity manually from inside the cluster, run a temporary pod:
1kubectl run netcat-check --rm -it --restart=Never \
2 --image=busybox:latest -n fluss -- \
3 nc -zv coordinator-server-0.coordinator-server-hs.fluss.svc.cluster.local 9124Substitute fluss with your namespace if it is different.
For information about configuring Flink SQL and Java SDK clients against this bootstrap address, see Reading and Writing Fluss
Production Hardening
Persistent Storage
By default, tablet servers use /tmp/fluss/data for the data.dir configuration. This path is an ephemeral, in-pod path that does not survive pod restarts. You must enable persistent volumes for production deployments.
1fluss:
2 tablet:
3 storage:
4 enabled: true
5 size: 500Gi
6 storageClass: gp3
7zookeeper:
8 persistence:
9 enabled: true
10 storageClass: gp3
11 accessModes: ["ReadWriteOnce"]
12 size: 8Gi
13 dataLogDir:
14 size: 8GiReplace gp3 with your cluster's storage class.
To list the available classes, run the following command:
1kubectl get storageclassesResource Requests and Limits
The Helm chart ships with no resource requests or limits set. You must configure these settings for your production environments:
1fluss:
2 resources:
3 coordinatorServer:
4 requests:
5 cpu: "2"
6 memory: 4Gi
7 limits:
8 cpu: "2"
9 memory: 4Gi
10 tabletServer:
11 requests:
12 cpu: "4"
13 memory: 8Gi
14 limits:
15 cpu: "4"
16 memory: 8Gi
17zookeeper:
18 resources:
19 requests:
20 memory: 2Gi
21 cpu: "1"
22 limits:
23 memory: 2Gi
24 cpu: "1"Replication and Bucketing
The Helm chart sets the following default values:
1fluss:
2 configurationOverrides:
3 default.bucket.number: 3
4 default.replication.factor: 3- default.replication.factor: Specifies how many tablet server replicas hold a copy of each bucket. This value must not exceed the value that you set for fluss.tablet.numberOfReplicas.
- default.bucket.number: Specifies the default number of buckets (shards) per table. Individual tables can override this setting at creation time using the ‘bucket.num’ table property in the Flink SQL WITH clause. A value equal to or a multiple of the tablet server count distributes the load evenly.
- fluss.tablet.numberOfReplicas: Controls the number of tablet server pods. The default value is 3. You might lower this value in resource-constrained environments, but you must adjust default.replication.factor accordingly.
Service Account
The Helm chart creates no service account by default (fluss.serviceAccount.create: false). You can either link an existing service account or have the chart create one for you.
1fluss:
2 serviceAccount:
3 create: true # set to false to use an existing account
4 name: fluss-saYou need a service account when you bind Fluss pods to a workload identity, such as AWS IRSA or GKE Workload Identity. This binding grants the pods access to cloud resources, such as remote storage.
Uninstall
1helm uninstall fluss --namespace flussThe cluster does not delete persistent volume claims automatically. To remove them, you must manually delete the claims:
1kubectl delete pvc -n fluss --allDeleting PVCs is irreversible and permanently removes all stored data.
Troubleshooting
Pods not starting
To troubleshoot issues, you can inspect the pod events and logs by running the following commands:
1kubectl describe pod <POD_NAME> -n fluss
2kubectl logs <POD_NAME> -n flussCommon causes for this issue include a missing image pull secret, insufficient cluster resources, or a PVC provisioning failure.
Pod stuck in ImagePullBackOf or ErrImagePull
To find the precise registry error, inspect the pod events by running the following command:
1kubectl -n fluss describe pod <POD_NAME>Common causes for this issue include the following items:
Image pull secret not picked up by the pod
You must set imagePullSecrets on the pod template, not on the namespace itself. Confirm the following details:
- The Helm release was installed with both fluss.image.pullSecrets and zookeeper.image.pullSecrets set to the secret name.
- The secret exists in the same namespace as the Helm release.
To verify the secret exists, run the following command:
1kubectl -n fluss get secret ververica-registryA namespace mismatch between the secret and the release is the most common cause of this issue. In this scenario, the chart installs successfully, but pods fail to pull images.
Bundled Zookeeper Pods Fail to Pull
If the Fluss pods start but the ZooKeeper pods do not, you likely configured the pull secret under fluss.image.pullSecrets but omitted it under zookeeper.image.pullSecrets. Both configuration blocks must reference the secret because the bundled ZooKeeper image is also served from the Ververica Platform registry.
401 Unauthorized or 403 Forbidden or helm registry login or helm pull
- Verify that your username and password are correct and do not contain leading or trailing whitespace.
- Confirm that your credentials were issued specifically for the Fluss projects on the registry, rather than for an unrelated Ververica Platform product.
- Run helm registry logout registry.ververica.cloud to clear stale cached credentials, and then log in again.
- Re-run docker logout registry.ververica.cloud and log in again to clear stale cached credentials.
Complete values.yaml example
The following example shows a production-ready values.yaml file that combines all the configuration settings described in this manual:
1fluss:
2 image:
3 pullSecrets:
4 - ververica-registry
5 configurationOverrides:
6 default.bucket.number: 3
7 default.replication.factor: 3
8 coordinator:
9 numberOfReplicas: 1
10 tablet:
11 numberOfReplicas: 3
12 storage:
13 enabled: true
14 size: 500Gi
15 storageClass: gp3
16 resources:
17 coordinatorServer:
18 requests:
19 cpu: "2"
20 memory: 4Gi
21 limits:
22 cpu: "2"
23 memory: 4Gi
24 tabletServer:
25 requests:
26 cpu: "4"
27 memory: 8Gi
28 limits:
29 cpu: "4"
30 memory: 8Gi
31zookeeper:
32 image:
33 pullSecrets:
34 - ververica-registry
35 persistence:
36 enabled: true
37 storageClass: gp3
38 accessModes: ["ReadWriteOnce"]
39 size: 8Gi
40 dataLogDir:
41 size: 8Gi
42 resources:
43 requests:
44 cpu: "1"
45 memory: 2Gi
46 limits:
47 cpu: "1"
48 memory: 2GiFurther Reading
- Apache Fluss Documentation — upstream reference
- Fluss Helm Chart Documentation — complete reference for all fluss: values
- Fluss Configuration Reference — all server configuration keys