Universal Blob Storage¶
Ververica Platform provides centralized configuration of blob storage for its services.
- Storage Providers
- Additional Provider Configuration
- Advanced Configuration
In order to enable universal blob storage configure a base URI for your blob storage
provider. Add the following snippet to your Helm
vvp: blobStorage: baseUri: s3://my-bucket/vvp
|Storage Provider||Scheme||Artifact Management||State Snapshots|
|Flink 1.16||Flink 1.15||Flink 1.14||Flink 1.13||Flink 1.12|
|Apache Hadoop® HDFS||
|Microsoft ABS Workload Identity||
(✓): With custom Flink image
“*” : With VVP Flink image
Some supported storage providers have additional options that can be configured in the
blobStorage section of the
values.yaml file, scoped by provider.
The following is a complete listing of supported additional storage provider configuration options:
blobStorage: s3: endpoint: "" region: "" oss: endpoint: ""
For Microsoft ABS Workload Identity add the following snippet to your Helm
vvp: blobStorage: baseUri: wiaz://<blob-container-name>@<your account name>.blob.core.windows.net/<path>
You do not need to provide any credentials to set up access to Azure Blob Storage using Microsoft ABS Workload Identity. You just need to provide your Azure client-id and optionally the tenant-id.
workloadIdentity: azure: clientId: xxxx–xxxx-xxxx-xxxx (tenantId: yyyy-yyyy-yyyy-yyyy)
If you want to run Flink jobs in a namespace other than VVP itself (the recommended way), you need to create a Kubernetes service account in that namespace and a federated identity for your Azure principal yourself.
One more important thing needs to be configured to run deployment that reference the service account. You need to assign service account name to pods and this can be done:
spec: template: spec: kubernetes: pods: labels: azure.workload.identity/use: 'true' serviceAccountName: ververica-platform-ververica-platform
There is also possibility to specify configuration for
jobManager independently. If you prefer such option, please follow configuration:
spec: template: spec: kubernetes: jobManagerPodTemplate: metadata: labels: azure.workload.identity/use: 'true' spec: serviceAccountName: ververica-platform-ververica-platform taskManagerPodTemplate: metadata: labels: azure.workload.identity/use: 'true' spec: serviceAccountName: ververica-platform-ververica-platform
Please be aware that you cannot mix configuration methods. You can specify the
pods attribute or the
If you have created your own namespace and related service account dedicated for deployments you need to replace
serviceAccountName: ververica-platform-ververica-platform with your service account name:
Ververica Platform supports using a single set of credentials to access your configured blob storage, and will automatically distribute these credentials to Flink jobs that require them.
These credentials can be either specified directly in
values.yaml, or added to a Kubernetes
secret out-of-band and referenced in
values.yaml by name.
The following is a complete listing of the credentials that can be given for each storage provider, with example values:
blobStorageCredentials: azure: connectionString: DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=vvpArtifacts;AccountKey=VGhpcyBpcyBub3QgYSB2YWxpZCBBQlMga2V5LiAgVGhhbmtzIGZvciB0aG9yb3VnaGx5IHJlYWRpbmcgdGhlIGRvY3MgOikgIA==; s3: accessKeyId: AKIAEXAMPLEACCESSKEY secretAccessKey: qyRRoU+/4d5yYzOGZVz7P9ay9fAAMrexamplesecretkey hdfs: # Apache Hadoop® configuration files (core-site.xml, hdfs-site.xml) # and optional Kerberos configuration files. Note that the keytab # has to be base64 encoded. core-site.xml: | <?xml version="1.0" ?> <configuration> ... </configuration> hdfs-site.xml: | <?xml version="1.0" ?> <configuration> ... </configuration> krb5.conf: | [libdefaults] ticket_lifetime = 10h ... keytab: BQIAA...AAAC keytab-principal: flink http: basicAuthUser: <user> basicAuthPassword: <pass> trustStoreFilePath: <path-to-file> trustStorePassword: <store-pass> trustStoreType: <key-store-type-eg-JKS>
To use a pre-created Kubernetes secret, its keys must match the pattern
s3.secretAccessKey. To configure Ververica Platform to use this
secret, add the following snippet to your Helm
blobStorageCredentials: existingSecret: my-blob-storage-credentials
The values in a Kubernetes secret must be base64-encoded.
For UBS with Apache Hadoop® HDFS we recommend to pre-create a Kubernetes secret with the required configuration files in order to avoid duplication of the configuration files in the Ververica Platform values.yaml file.
kubectl create secret generic my-blob-storage-credentials \ --from-file hdfs.core-site.xml=core-site.xml \ --from-file hdfs.hdfs-site.xml=hdfs-site.xml \ --from-file hdfs.krb5.conf=krb5.conf \ --from-file hdfs.keytab=keytab \ --from-file hdfs.keytab-principal=keytab-principal
After you have created the Kubernetes secret, you can reference it in the values.yaml as an existing secret. Note that the Kerberos configuration is optional.
An alternative way to provide credentials securely to VVP is to access the credentials as mounted files.
To do so, each security key must be configured via a separate file, and the files must be named following the pattern
$ .cat ./http.basicAuthUser admin $ .cat ./http.basicAuthPassword password
The directory that contains the credentials files must then be mounted. Assuming the files are under the path
/conf/blob-creds, they can be mounted either using environment variables, or using VVP properties. In both cases, the setting is made in
- Using environment variables:
env: - name: "vvp.blob-storage.credentials-dir" value: "/conf/blob-creds"
- Using VVP properties:
vvp: blobStorage: credentialsDir: /conf/blob-creds
You can choose any appropriate name for the mounted directory, but the credentials filenames must exactly follow the pattern
<provider>.<key>, for example
When running on AWS EKS or AWS ECS your Kubernetes Pods inherit the roles attached to the underlying EC2 instances.
If these roles already grant access to the required S3 resources you only need to configure
vvp.blobStorage.baseUri without configuring any
UBS with Apache Hadoop® HDFS uses a Hadoop 2 client for communication with the HDFS cluster. Hadoop 3 preserves wire compatibility with Hadoop 2 clients and you are able to use HDFS blob storage with both Hadoop 2 and Hadoop 3 HDFS clusters.
But note that there may be incompatabilities between Hadoop 2 and 3 with respect to the configuration files core-site.xml and hdfs-site.xml. As an example, Hadoop 3 allows to configure durations with a unit suffix such as 30s which results in a configuration parsing error with Hadoop 2 clients. It’s generally possible to work around these issues by limiting configuration to Hadoop 2 compatible keys/values.
The following services make use of the universal blob storage configuration.
Artifacts are stored in the following location:
The SQL Service depends on blob storage for storing deployment information and JAR files of user-defined functions.
Before a SQL query can be deployed it needs to be optimized and translated to a Flink job. SQL Service stores the Flink job and all JAR files that contain an implementation of a user-defined function which is used by the query at the following locations:
|UDF JAR Files||
After a query has been deployed, Application Manager maintains the same blobs as for regular Flink jobs, i.e., checkpoints, savepoints, and high-availability files.
The JAR files of UDF Artifacts that are uploaded via the UI are stored in the following location: