Manage Apache Paimon catalog
After you configure an Apache Paimon catalog, you can directly access Apache Paimon tables that are stored in S3 buckets in Ververica Cloud. This topic describes how to create, view, use, and delete an Apache Paimon catalog in Ververica Cloud.
Background Information
Apache Paimon is a format enabling a unified lake storage that allows you to process data in streaming and batch modes. Apache Paimon supports data writing with high throughput and data queries with low latency. You can use Apache Paimon to efficiently deploy your own data lake storage service on a cloud-based object storage platform, enabling seamless data lake analytics. For more information, see Apache Paimon.
Apache Paimon catalogs allow you to manage Apache Paimon tables that are stored in S3 buckets. The created tables can also be accessed by using other compute engines. This topic describes the following operations that you can perform to manage Apache Paimon catalogs:
- Create an Apache Paimon catalog
- View an Apache Paimon catalog
- Use an Apache Paimon catalog
- Delete an Apache Paimon catalog
Prerequisites
-
Ververica Cloud S3 is activated.
noteYou can use the S3 bucket that you specified when you activated the Ververica Cloud service. However, to better distinguish data and prevent misoperations, Ververica recommends that you create and use an S3 bucket that resides in the same region as Ververica Cloud.
-
A private connection should be set up, see Apache Paimon connector and Amazon S3.
Limits
Only Ververica Cloud with a VERA engine compatible with Flink 1.17 or later supports Apache Paimon catalogs. The S3 bucket that is used by an Apache Paimon catalog must reside in the same region as Ververica Cloud.
Precautions
After you execute an SQL statement to create or delete a catalog, database, or table, you cannot immediately view the changes on the Catalogs page due to the cache mechanism of Ververica Cloud. To view the changes after you perform the operation, you must click the Refresh icon in the Catalogs pane of the Catalogs page.
Create a Paimon Catalog in the UI
-
On the Dashboard page, open the console for the workspace you want to manage.
-
In the Console navigation pane, click Catalogs.
-
On the Catalog list page, click Create Catalog.
-
In the Create Catalog dialog box, select Paimon and click Next.
-
Configure the parameters.
cautionAfter you create a Paimon catalog, the parameter configuration cannot be modified. If you want to modify the parameter configuration, you must delete the Paimon catalog that you created and create a Paimon catalog again.
Parameter Description Required catalog name Catalog name. Yes metastore Metadata storage type (select filesystem*). Yes warehouse The data warehouse directory specified in the S3 service. See Warehouse directory format below. Yes -
Click Confirm. You can see the created catalog on the catalogs list page in the Catalogs tab.
Warehouse Directory Format
The format is s3a://<bucket>/<object>
.
Parameters in the path:
- bucket: indicates the name of the S3 bucket that you created.
- object: indicates the path in which your data is stored. You can view the names of your bucket and object in the S3 console.
Note:
- A private connection should be set up, see Apache Paimon connector.
- The warehouse path should start with s3a protocol (e.g,
s3a://my-bucket/paimon/warehouse
). - The last slash (/) should be removed from S3 URI (e.g,
s3a://my-bucket/paimon/warehouse
and nots3a://my-bucket/paimon/warehouse/
).
View an Apache Paimon Catalog
After you create an Apache Paimon catalog, you can perform the following steps to view the metadata of the Apache Paimon catalog.
- On the Dashboard page, open the console for the workspace you want to manage.
- In the Console navigation pane, click Catalogs.
- On the Catalog List page, find the desired catalog and view the Name and Type columns of the catalog.
If you want to view the databases and tables in the catalog, click View in the Actions column.
Use an Apache Paimon Catalog
Create a Database and a Table
After the Apache Paimon catalog is configured, you can reference tables of the Apache Paimon catalog as result tables and dimension tables in deployments.
In an SQL statement, you can use a table name of the Apache Paimon catalog in the following complete format:
${Paimon-catalog-name}.${Paimon-db-name}.${Paimon-table-name}
You can also execute the use catalog ${Paimon-catalog-name}
and use ${Paimon-db-name}
statements to declare the catalog name and database name and then use only the table name in the ${Paimon-table-name}
format in the SQL statement.
Use an Apache Paimon Catalog on the UI
- On the Dashboard page, open the console for the workspace you want to manage.
- In the Console navigation pane, click Catalogs.
- Find the desired catalog and click View in the Actions column.
- On the page that appears, click Create Table.
- On the Built-in tab of the Create Table dialog box, click Apache Paimon and click Next.
- Enter the table creation statement and configure related parameters. Sample code:
CREATE TABLE <catalog name>.test_db.test_tbl (
dt STRING,
id BIGINT,
data STRING,
PRIMARY KEY (dt, id) NOT ENFORCED
) PARTITIONED BY (dt);
- Click Confirm.
For more information about the parameters and usage of Apache Paimon tables, see Apache Paimon connector.
Use an Apache Paimon Catalog by Executing an SQL Statement
- Create a blank streaming draft. For more information, see Create a SQL draft.
- In the code editor, enter the table creation statement.
CREATE DATABASE paimoncatalog.test_db;
CREATE TABLE paimoncatalog.test_db.test_tbl (
dt STRING,
id BIGINT,
data STRING,
PRIMARY KEY (dt, id) NOT ENFORCED
) PARTITIONED BY (dt);
- Select the table creation statement and click Run that appears on the left side of the code.
Use an Apache Paimon Catalog as the Catalog of the Destination Store that is Used in the CREATE TABLE AS Statement
In Ververica Cloud whose engine version is vera-1.0.3-flink-1.13 or later, you can use an Apache Paimon catalog as the catalog of the destination store that is used in the CREATE TABLE AS statement.
CREATE TABLE IF NOT EXISTS `<catalog name>`.`<db name>`.`<table name>`
WITH (
'bucket' = '4' -- Specify the number of buckets for the result table.
) AS TABLE `<source table>`;
The CREATE TABLE AS statement allows you to configure physical table properties in the WITH clause. When you create a destination table, you can configure these properties for the table. For more information about the table properties supported by Apache Paimon catalogs, see Apache Paimon connector.
When you execute the CREATE TABLE AS statement, you may need to change the data type precision for the existing fields. For example, you can change the data type precision from VARCHAR(10)
to VARCHAR(20)
.
Use an Apache Paimon Catalog as the Catalog of the Destination Store that is Used in the CREATE DATABASE AS Statement
In Ververica Cloud whose engine version is vera-1.0.3-flink-1.13 or later, you can use an Apache Paimon catalog as the catalog of the destination store that is used in the CREATE DATABASE AS statement.
CREATE DATABASE IF NOT EXISTS `<catalog name>`.`<db name>`
WITH (
'bucket' = '4' -- Specify the number of buckets for each result table.
) AS DATABASE `<source database>`;
The CREATE DATABASE AS statement allows you to configure physical table properties in the WITH clause for a deployment. When the deployment starts, these parameters take effect on the result tables to which you want to synchronize data. For more information about the table properties supported by Apache Paimon catalogs, see Apache Paimon connector.
Delete an Apache Paimon Catalog
Delete an Apache Paimon Catalog on the UI
- Log on to Ververica Cloud.
- On the Dashboard, click the name of the workspace that you want to manage to display the Console.
- In the left-side navigation pane, click Catalogs.
- Find the desired catalog and click Delete in the Actions column.
- In the message that appears, click Delete.
- View the Catalogs pane on the left side of the Catalog List page to check whether the catalog is deleted.
Delete an Apache Paimon Catalog by Executing an SQL Statement
- Create a blank streaming draft. For more information, see Create a SQL draft.
note
The engine version must be vera-1.0.3-flink-1.13 or later.
- In the code editor, enter the following statement:
DROP CATALOG <catalog name>;
where <catalog name>
indicates the name of the Apache Paimon catalog that you want to delete.
- Right-click the statement that is used to delete the catalog and choose Run from the shortcut menu.
- View the Catalogs pane on the left side of the Catalog List page to check whether the catalog is deleted.