Deploying Fluss on Kubernetes

Applies toBYOCSelf-Managed v3

6 min read

On this page

This document guides developers through deploying and hardening a Fluss cluster on Kubernetes using Helm. It covers installation workflows, multi-cluster management, client connectivity, and production optimization strategies.

Overview

The fluss-bundle Helm chart deploys a complete Fluss cluster on Kubernetes. The chart bundles Fluss with ZooKeeper (Bitnami, with the image mirrored to the Ververica registry) and exposes the Fluss client endpoint on port 9124.

The default deployment topology includes the following components:

Component	Owner
Component	Owner
Coordinator server	1
Tablet servers	3
ZooKeeper nodes	3

The Ververica Platform registry at registry.ververica.cloud hosts all images for both Fluss and ZooKeeper. A single image pull secret covers both of these components.

This manual covers a minimal deployment and common production hardening options. You can find information about remote storage, SASL authentication, and Prometheus monitoring in separate manuals.

Prerequisites

Kubernetes 1.24+ cluster with kubectl configured and pointing to the target cluster
Helm 3.8+ (OCI chart support required)
Access to the Ververica registry (registry.ververica.cloud)
Sufficient cluster capacity for the default pod count (7 pods) plus your configured resource requests

Registry Credentials

To obtain credentials, view available artifact endpoints, and find registry login commands for both Helm OCI and Docker, see Obtaining Registry Access. The remainder of this manual assumes you have exported REGISTRY_USERNAME and REGISTRY_PASSWORD in your shell.

Create the Image Pull Secret

Create a Kubernetes image pull secret in your target namespace before installing the chart. Both the Fluss and ZooKeeper pods use this secret:

BASH

1kubectl create namespace fluss
2kubectl create secret docker-registry ververica-registry \
3  --docker-server=registry.ververica.cloud \
4  --docker-username="$REGISTRY_USERNAME" \
5  --docker-password="$REGISTRY_PASSWORD" \
6  --namespace fluss

Verify Registry Access

Before installing the chart, confirm that both the Helm OCI registry and the Kubernetes image pull path work end-to-end. See Obtaining Registry Access.docx for the canonical artifact paths.

Verify Helm OCI access

Pull the chart locally without installing it by running the following command:

BASH

1helm pull oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --destination /tmp

A fluss-bundle-0.9.1-vv-2.tgz file should appear under /tmp. Authentication failures surface as 401 Unauthorized or 403 Forbidden. If you encounter these errors, see Troubleshooting.

Verify Docker image pull from the cluster

Run a one-shot pod in your Fluss namespace that uses the pull secret to fetch the Fluss image:

BASH

1kubectl -n fluss run registry-check \
2  --rm -it --restart=Never \
3  --image=registry.ververica.cloud/platform-images/fluss:0.9.1-vv-2 \
4  --overrides='{"spec":{"imagePullSecrets":[{"name":"ververica-registry"}]}}' \
5  -- /bin/sh -c "echo image pulled successfully"

If the pod prints that the image pulled successfully and exits cleanly, you have wired up the secret correctly. A pod stuck in ImagePullBackOff indicates an authentication or naming problem. If you encounter this issue, see Troubleshooting.

Install the Chart

Choose a Version

Important

The version-discovery workflow depends on the production distribution pipeline, which the Ververica Platform team has not yet finalized. Once the pipeline is established, this section will document how to discover available chart and image versions.

Create a values.yaml file that references the pull secret. You must include the zookeeper.image.pullSecrets entry because the Ververica Platform registry also serves the bundled ZooKeeper image. The fluss.image.tag defaults to the image version shipped with this chart release and should not be overridden:

YAML

1fluss:
2  image:
3    pullSecrets:
4      - ververica-registry
5zookeeper:
6  image:
7    pullSecrets:
8      - ververica-registry

For the full set of configurable fields under fluss:, refer to the Fluss Helm Chart documentation.

Install

BASH

1helm install fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --namespace fluss \
4  -f values.yaml

Note

The production distribution pipeline is not yet finalized, so the exact chart registry path might change.

Upgrade

When you change a value in values.yaml and want to apply it without bumping the chart version, run the following command:

BASH

1helm upgrade fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --namespace fluss \
4  -f values.yaml

To upgrade to a new chart release, point to the new version by running the following command:

BASH

1helm upgrade fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --namespace fluss \
4  -f values.yaml

Warning

Cluster availability during upgrade
The helm upgrade command triggers a rolling restart of the Fluss and ZooKeeper StatefulSets. The single coordinator pod has a brief outage window during its restart, though clients reconnect within seconds. The tablet server replicas restart 1 at a time. Each restart causes a leader failover for the partitions that the pod leads, which produces short read and write latency spikes. You should plan upgrades during a maintenance window if your workload is sensitive to these effects.

Verify the Deployment

Verify that all pods reach the Running state by running the following command:

BASH

1kubectl get pods -n fluss

Expected pods include the following:

Pod	Count
Pod	Count
coordinator-server-0	1
tablet-server-0/1/2	3
fluss-zookeeper-0/1/2	3

List the services to identify the client endpoint by running the following command:

BASH

1kubectl get services -n fluss

The coordinator server service exposes the Fluss client port (9124).

Managing Multiple Fluss Clusters

Each Helm release in a separate Kubernetes namespace acts as an independent Fluss cluster. No state is shared between namespaces.

To run a second cluster alongside the first, execute the following command:

BASH

1kubectl create namespace fluss-staging

Create the image pull secret in the new namespace by following the same steps as Create the Image Pull Secret, substituting --namespace fluss-staging.

BASH

1helm install fluss-staging oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2> \
3  --namespace fluss-staging \
4  -f values-staging.yaml

Manage each cluster independently with helm upgrade or helm uninstall commands scoped to its specific namespace.

Important

Fluss service names do not include the Helm release name. Two clusters in the same namespace would have colliding service names, so you must use separate namespaces.

Connecting to Fluss

Fluss clients connect using bootstrap.servers pointing to the coordinator pod through its headless service on port 9124. The Fluss chart uses fixed service names without a Helm release-name prefix, so the bootstrap address is namespace-scoped:

TEXT

1coordinator-server-0.coordinator-server-hs.<namespace>.svc.cluster.local:9124

For a cluster deployed in the fluss namespace, use the following address:

TEXT

1coordinator-server-0.coordinator-server-hs.fluss.svc.cluster.local:9124

The coordinator and tablet pods have readiness and liveness probes configured, so a pod in the Running state with ready containers has already passed the port check. To verify connectivity manually from inside the cluster, run a temporary pod:

BASH

1kubectl run netcat-check --rm -it --restart=Never \
2  --image=busybox:latest -n fluss -- \
3  nc -zv coordinator-server-0.coordinator-server-hs.fluss.svc.cluster.local 9124

Substitute fluss with your namespace if it is different.

For information about configuring Flink SQL and Java SDK clients against this bootstrap address, see Reading and Writing Fluss

Production Hardening

Persistent Storage

By default, tablet servers use /tmp/fluss/data for the data.dir configuration. This path is an ephemeral, in-pod path that does not survive pod restarts. You must enable persistent volumes for production deployments.

YAML

1fluss:
2  tablet:
3    storage:
4      enabled: true
5      size: 500Gi
6      storageClass: gp3
7zookeeper:
8  persistence:
9    enabled: true
10    storageClass: gp3
11    accessModes: ["ReadWriteOnce"]
12    size: 8Gi
13    dataLogDir:
14      size: 8Gi

Replace gp3 with your cluster's storage class.

To list the available classes, run the following command:

BASH

1kubectl get storageclasses

Resource Requests and Limits

The Helm chart ships with no resource requests or limits set. You must configure these settings for your production environments:

YAML

1fluss:
2  resources:
3    coordinatorServer:
4      requests:
5        cpu: "2"
6        memory: 4Gi
7      limits:
8        cpu: "2"
9        memory: 4Gi
10    tabletServer:
11      requests:
12        cpu: "4"
13        memory: 8Gi
14      limits:
15        cpu: "4"
16        memory: 8Gi
17zookeeper:
18  resources:
19    requests:
20      memory: 2Gi
21      cpu: "1"
22    limits:
23      memory: 2Gi
24      cpu: "1"

Replication and Bucketing

The Helm chart sets the following default values:

YAML

1fluss:
2  configurationOverrides:
3    default.bucket.number: 3
4    default.replication.factor: 3

default.replication.factor: Specifies how many tablet server replicas hold a copy of each bucket. This value must not exceed the value that you set for fluss.tablet.numberOfReplicas.
default.bucket.number: Specifies the default number of buckets (shards) per table. Individual tables can override this setting at creation time using the ‘bucket.num’ table property in the Flink SQL WITH clause. A value equal to or a multiple of the tablet server count distributes the load evenly.
fluss.tablet.numberOfReplicas: Controls the number of tablet server pods. The default value is 3. You might lower this value in resource-constrained environments, but you must adjust default.replication.factor accordingly.

Service Account

The Helm chart creates no service account by default (fluss.serviceAccount.create: false). You can either link an existing service account or have the chart create one for you.

YAML

1fluss:
2  serviceAccount:
3    create: true   # set to false to use an existing account
4    name: fluss-sa

You need a service account when you bind Fluss pods to a workload identity, such as AWS IRSA or GKE Workload Identity. This binding grants the pods access to cloud resources, such as remote storage.

Uninstall

BASH

1helm uninstall fluss --namespace fluss

The cluster does not delete persistent volume claims automatically. To remove them, you must manually delete the claims:

BASH

1kubectl delete pvc -n fluss --all

Warning

Deleting PVCs is irreversible and permanently removes all stored data.

Troubleshooting

Pods not starting

To troubleshoot issues, you can inspect the pod events and logs by running the following commands:

BASH

1kubectl describe pod <POD_NAME> -n fluss
2kubectl logs <POD_NAME> -n fluss

Common causes for this issue include a missing image pull secret, insufficient cluster resources, or a PVC provisioning failure.

Pod stuck in ImagePullBackOf or ErrImagePull

To find the precise registry error, inspect the pod events by running the following command:

BASH

1kubectl -n fluss describe pod <POD_NAME>

Common causes for this issue include the following items:

Symptom in events	Cause	Fix
Symptom in events	Cause	Fix
pull access denied, unauthorized	Pull secret missing, wrong name, or wrong namespace.	Recreate the secret in the pod's namespace, and then confirm that the pullSecrets field in your value.yaml file references its exact name.
manifest unknown	Tag does not exist.	Confirm that the <TAG> matches the value that Ververica Platform communicated to you.
dial tcp . . . i/o timeout	Cluster nodes cannot reach registry.ververica.cloud.	Check egress firewalls, NAT gateways, and any registry mirror configurations on the nodes.

Image pull secret not picked up by the pod

You must set imagePullSecrets on the pod template, not on the namespace itself. Confirm the following details:

The Helm release was installed with both fluss.image.pullSecrets and zookeeper.image.pullSecrets set to the secret name.
The secret exists in the same namespace as the Helm release.

To verify the secret exists, run the following command:

BASH

1kubectl -n fluss get secret ververica-registry

A namespace mismatch between the secret and the release is the most common cause of this issue. In this scenario, the chart installs successfully, but pods fail to pull images.

Bundled Zookeeper Pods Fail to Pull

If the Fluss pods start but the ZooKeeper pods do not, you likely configured the pull secret under fluss.image.pullSecrets but omitted it under zookeeper.image.pullSecrets. Both configuration blocks must reference the secret because the bundled ZooKeeper image is also served from the Ververica Platform registry.

401 Unauthorized or 403 Forbidden or helm registry login or helm pull

Verify that your username and password are correct and do not contain leading or trailing whitespace.
Confirm that your credentials were issued specifically for the Fluss projects on the registry, rather than for an unrelated Ververica Platform product.
Run helm registry logout registry.ververica.cloud to clear stale cached credentials, and then log in again.
Re-run docker logout registry.ververica.cloud and log in again to clear stale cached credentials.

Complete values.yaml example

The following example shows a production-ready values.yaml file that combines all the configuration settings described in this manual:

YAML

1fluss:
2  image:
3    pullSecrets:
4      - ververica-registry
5  configurationOverrides:
6    default.bucket.number: 3
7    default.replication.factor: 3
8  coordinator:
9    numberOfReplicas: 1
10  tablet:
11    numberOfReplicas: 3
12    storage:
13      enabled: true
14      size: 500Gi
15      storageClass: gp3
16  resources:
17    coordinatorServer:
18      requests:
19        cpu: "2"
20        memory: 4Gi
21      limits:
22        cpu: "2"
23        memory: 4Gi
24    tabletServer:
25      requests:
26        cpu: "4"
27        memory: 8Gi
28      limits:
29        cpu: "4"
30        memory: 8Gi
31zookeeper:
32  image:
33    pullSecrets:
34      - ververica-registry
35  persistence:
36    enabled: true
37    storageClass: gp3
38    accessModes: ["ReadWriteOnce"]
39    size: 8Gi
40    dataLogDir:
41      size: 8Gi
42  resources:
43    requests:
44      cpu: "1"
45      memory: 2Gi
46    limits:
47      cpu: "1"
48      memory: 2Gi