Lake Job Credentials

Applies toBYOCSelf-Managed v3

6 min read

On this page

Overview
Shipping the File
Per Flavor content
On Amazon S3 (Hadoop catalog)
On Azure Blob Storage (Hadoop catalog)
On NooBaa or another S3-compatible store (Hadoop catalog)
On Amazon S3 (REST catalog)
On Ververica Platform 3 (VVP3)
On Ververica Cloud (BYOC)
Further Reading

This document guides operators on how to configure object-store credentials for Apache Flink jobs using the Fluss connector and Iceberg lake plugins. It covers credential delivery across various runners, including the Flink Kubernetes Operator, standard CLI/session clusters, and Ververica Platform 3.

Overview

An Apache Flink job that accesses the lake tier of a Fluss table reads historical data directly from the underlying object store. The job reads Iceberg metadata and data files using Hadoop's HadoopFileIO. It also reads tiered Fluss segments using the Fluss filesystem plugin (fs-s3 or fs-azure).

Neither path participates in the Fluss delegation-token mechanism. Because of this, the job must carry its own object-store credentials. The method you use to deliver these credentials depends on your runner. Review the sections below for details.

When you use Flink-native runners (such as the Kubernetes Operator, CLI, or session and application clusters), you can theoretically supply credentials through several channels. These channels include Flink configuration entries, environment variables, IAM Roles for Service Accounts (IRSA), or a mounted credentials provider.

However, placing a Hadoop core-site.xml file on the JobManager and TaskManager classpath is the most portable and stable approach across runners and across both code paths. This document focuses on that method.

Both code paths run inside the same TaskManager JVM and read the same core-site.xml file:

The Iceberg reader and writer through Hadoop's HadoopFileIO construct their configuration by loading core-site.xml from the JVM classpath. Hadoop registers core-site.xml as a default classpath resource.
The Fluss filesystem plugin (fs-s3 or fs-azure) constructs a bare org.apache.hadoop.conf.Configuration() object that picks up the same file from the same place.

Therefore, 1 file shipped to the Flink job credentials both paths. You do not need to inject Hadoop properties through a second channel. Flink configuration hadoop.* entries do not reach the Iceberg configuration, and the Fluss filesystem plugin does not read the Flink configuration either.

This behavior applies to both Hadoop and REST catalog flavors. REST catalog flavors still require a core-site.xml file, even when the catalog vends per-table scoped credentials at table-load time. The initial metadata write and any catalog-side bootstrap tasks might fail without it, as they fall back to the static core-site.xml keys.

Other runners use a different channel. On Ververica Platform 3, lake jobs do not use a core-site.xml file. Instead, credentials reach the job through the platform's injected AWS credentials; see On Ververica Platform 3 below.

The Java SDK carries object-store credentials on the Fluss configuration instead. For more details, see Reading and Writing Fluss.docx > On NooBaa or another STS-less S3-compatible store.

Shipping the File

When you use the Flink Kubernetes Operator, which is the recommended approach, render your core-site.xml file into a Kubernetes ConfigMap. You can then reference it from your flinkConfiguration:

YAML

1flinkConfiguration:
2  kubernetes.hadoop.conf.config-map.name: <CORE_SITE_CONFIGMAP_NAME>

The operator's HadoopConfMountDecorator mounts the ConfigMap at /opt/hadoop/conf on the JobManager and TaskManager pods, and sets HADOOP_CONF_DIR automatically. Iceberg's new Configuration() object and the Fluss filesystem plugins both pick up core-site.xml from that location with no further action.

If you use other Flink runners, such as a session cluster, standalone deployment, or application mode without the operator, place the core-site.xml file on disk on every JobManager and TaskManager. Common locations include /etc/hadoop/conf/ or a custom path of your choice. You must then set HADOOP_CONF_DIR to that directory in your Flink launch environment. Use your runner's standard mechanism for shipping configuration files, such as Helm chart values, a container image layer, an init container, or a sidecar volume. Ensure Ververica Platform can access these paths during deployment execution.

Per Flavor content

On Amazon S3 (Hadoop catalog)

XML

1<configuration>
2  <property><name>fs.s3a.access.key</name><value><AWS_ACCESS_KEY_ID></value></property>
3  <property><name>fs.s3a.secret.key</name><value><AWS_SECRET_ACCESS_KEY></value></property>
4  <property><name>fs.s3a.region</name><value><AWS_REGION></value></property>
5</configuration>

If you authenticate using IRSA, omit the static access.key and secret.key properties. The Flink pod's ServiceAccount must be annotated with eks.amazonaws.com/role-arn pointing at an IAM role that has access to the warehouse bucket.

This setup is required on Flink 1.20. Hadoop 3.3.4 (bundled with Flink 1.20) does not include WebIdentityTokenCredentialsProvider in its default S3A chain. Because of this, you must add the provider explicitly when you use Flink 1.20. Newer Flink and Hadoop versions where the default chain already covers IRSA do not need this override.

XML

1<configuration>
2  <property>
3    <name>fs.s3a.aws.credentials.provider</name>
4    <value>com.amazonaws.auth.WebIdentityTokenCredentialsProvider,com.amazonaws.auth.InstanceProfileCredentialsProvider</value>
5  </property>
6  <property><name>fs.s3a.region</name><value><AWS_REGION></value></property>
7</configuration>

If you prefer static credentials as an alternative to IRSA, you have 2 equivalent options.

The first option is to put them directly in your core-site.xml file:

XML

1<configuration>
2  <property><name>fs.s3a.access.key</name><value><AWS_ACCESS_KEY_ID></value></property>
3  <property><name>fs.s3a.secret.key</name><value><AWS_SECRET_ACCESS_KEY></value></property>
4  <property><name>fs.s3a.region</name><value><AWS_REGION></value></property>
5</configuration>

The second option is to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as environment variables on your JobManager and TaskManager pods. Hadoop's default S3A credential chain picks them up through the EnvironmentVariableCredentialsProvider automatically.

If you use this method, your core-site.xml file only needs to contain the region and any non-default S3 endpoint details. Ververica Platform passes these environment variables down to the respective containers during deployment.

On Azure Blob Storage (Hadoop catalog)

XML

1<configuration>
2  <property>
3    <name>fs.azure.account.key.<STORAGE_ACCOUNT>.dfs.core.windows.net</name>
4    <value><STORAGE_ACCOUNT_KEY></value>
5  </property>
6</configuration>

On NooBaa or another S3-compatible store (Hadoop catalog)

If you use an STS-less S3-compatible endpoint, configure your core-site.xml file with per-bucket access keys using the following format:

XML

1<configuration>
2  <property><name>fs.s3a.endpoint</name><value><S3_ENDPOINT></value></property>
3  <property><name>fs.s3a.region</name><value>us-east-1</value></property>
4  <property><name>fs.s3a.path.style.access</name><value>true</value></property>
5  <property><name>fs.s3a.aws.credentials.provider</name>
6    <value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value></property>
7  <!-- Per-bucket credentials. The job touches both the remote-data bucket
8       (where tiered Fluss segments live) and the lake warehouse bucket
9       (where Iceberg files live). On NooBaa each Object Bucket Claim is
10       provisioned with its own AKSK pair, so the two buckets need separate
11       credentials. Hadoop's per-bucket overrides route the right AKSK to each. -->
12  <property><name>fs.s3a.bucket.<REMOTE_BUCKET>.access.key</name><value><REMOTE_ACCESS_KEY_ID></value></property>
13  <property><name>fs.s3a.bucket.<REMOTE_BUCKET>.secret.key</name><value><REMOTE_SECRET_ACCESS_KEY></value></property>
14  <property><name>fs.s3a.bucket.<LAKE_BUCKET>.access.key</name><value><LAKE_ACCESS_KEY_ID></value></property>
15  <property><name>fs.s3a.bucket.<LAKE_BUCKET>.secret.key</name><value><LAKE_SECRET_ACCESS_KEY></value></property>
16</configuration>

The fs.s3a.path.style.access property and the explicit SimpleAWSCredentialsProvider are required for S3-compatible stores. The fs.s3a.region property is required by the S3 client, but NooBaa-class services do not enforce its value. On stores where a single principal spans both buckets, replace the 4 per-bucket properties with the global fs.s3a.access.key and fs.s3a.secret.key properties.

The Java SDK does not have core-site.xml on its classpath and uses a different channel instead. For more details, see Reading and Writing Fluss.docx > On NooBaa or another STS-less S3-compatible store. Ensure Ververica Platform is configured to handle these endpoint variations when deploying your application.

On Amazon S3 (REST catalog)

The Iceberg REST catalog (such as Apache Polaris) mediates metadata commits, but Iceberg's HadoopFileIO still requires S3 credentials to write the underlying data files. You must supply these credentials through the same core-site.xml structure shown in the On Amazon S3 (Hadoop catalog) section above.

If your REST catalog vends STS-scoped credentials at table-load time, which Apache Polaris does in its default configuration, the catalog's vended credentials take precedence over the static core-site.xml keys for any given table operation. The static keys remain useful as a fallback for catalogs that do not vend credentials and for the initial metadata write before scoped credentials become available to Ververica Platform.

On Ververica Platform 3 (VVP3)

Ververica Platform 3 does not use a core-site.xml file. It injects platform AWS credentials into every deployment, but those credentials cover the Ververica Platform 3 internal blob storage, not your Fluss lake bucket. Because of this, a tiering or lake-reading deployment that resolves credentials through the AWS SDK default chain fails with an AccessDenied error against the lake bucket.

You must grant access by attaching a policy with read-write permissions on the lake bucket to the IAM principal that backs those injected credentials:

JSON

1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
7      "Resource": "arn:aws:s3:::<LAKE_BUCKET>"
8    },
9    {
10      "Effect": "Allow",
11      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
12      "Resource": "arn:aws:s3:::<LAKE_BUCKET>/*"
13    }
14  ]
15}

The AWS SDK default credential chain reads the injected credentials first, so both Flink's S3 filesystem plugin and Iceberg's HadoopFileIO pick up the widened permissions automatically. This means you do not need a core-site.xml file or any per-deployment credential configuration. For the Ververica Platform 3 deployment workflow these credentials apply to, see Running Lakehouse (Iceberg) Jobs against Fluss.docxs > Running Jobs on Ververica Platform 3.

On Ververica Cloud (BYOC)

BYOC also does not use a core-site.xml. On BYOC, Flink workloads run in your own cloud account, and the job pods carry no per-workload IAM identity (no IRSA service-account role). Their S3 clients therefore fall back to the IAM role attached to the compute nodes the jobs run on. That node role grants access to BYOC's own runtime-artifact storage, not your Fluss lake bucket — so a tiering or lake-reading deployment that resolves credentials through the AWS SDK default chain fails with AccessDenied against the lake bucket.

Grant access by attaching a policy with read-write on the lake bucket to the compute node IAM role:

JSON

1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
7      "Resource": "arn:aws:s3:::<LAKE_BUCKET>"
8    },
9    {
10      "Effect": "Allow",
11      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
12      "Resource": "arn:aws:s3:::<LAKE_BUCKET>/*"
13    }
14  ]
15}

The AWS SDK default credential chain resolves the node role automatically, so both Flink's S3 filesystem plugin and Iceberg's HadoopFileIO pick up the widened permissions — no core-site.xml and no per-deployment credential configuration is required. For the BYOC deployment workflow these credentials apply to, see Running Lakehouse (Iceberg) Jobs against Fluss › Running Jobs on Ververica Cloud (BYOC).