Catalogs & Databases¶
Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.
A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is
`<catalog-name>`.`<database-name>`.`<table-name>`. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the
Every namespace has its own built-in VVP Catalog named
vvp with a default database named
default. The default catalog of a namespace cannot be dropped.
default is a keyword in SQL, you need to escape the name of
vvp’s default database as
Ververica Platform packages the following catalogs:
Support for additional packaged catalogs, like Confluent® schema registry, is planned in an upcoming release.
VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform’s persistence layer.
A VVP Catalog
mycatalog with a default database
mydb is created with the following DDL statement:
CREATE CATALOG mycatalog WITH ( 'type' = 'vvp', -- mandatory 'defaultDatabase' = 'main' -- mandatory );
VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.
The following database objects are currently supported by VVP Catalog:
- See Database DDL to learn how to create, alter, and drop databases.
- See Tables & Views to learn how to create, alter, and drop tables.
- User-Defined Functions
- See Functions to learn how to implement and manage user-defined functions.
User-defined functions can only be registered in VVP Catalogs.
Not supported features at this point are:
- Table Partitions
- Table, Column, and Partition Statistics
We will add these features in future releases depending on user feedback.
With the JDBC catalog you can connect to a remote database and expose its tables in Apache Flink® SQL.
Ververica Platform currently only ships with the JDBC driver for Postgres. Other drivers are not yet supported through the packaged connector.
CREATE CATALOG pg_catalog WITH ( 'type' = 'jdbc', 'base-url' = 'jdbc:postgresql://…', 'default-database' = '…', 'username' = '…', 'password' = '…' )
See the documentation for more information on the JDBC catalog and available options.
Manage Packaged Catalogs¶
Packaged catalogs consist of
- A YAML metadata file providing Ververica Platform with structured information about the catalog.
- The JAR file(s) including the catalog, its factory, and all required dependencies.
These files are located in the Ververica Platform Gateway docker image in a dedicated directory for
each catalog under
You can add your own packaged catalog, making it available in all Ververica Platform namespaces, by patching the Docker image. Simply add a new folder for the catalog under the directory given above and include the metadata file as well as all relevant JAR files.
Similarly, you can remove or modify packaged catalogs by removing over replacing the contents of its folder.
Packaged catalogs are not per-namespace resources. Any modifications will affect all namespaces.
Catalog Metadata File¶
The metadata file of a catalog must be named
catalog-meta.yaml and has the following schema:
catalog: type: # Factory identifier of the catalog, e.g. "jdbc". This must be the same as the folder name. packaged: # This must be set to true for packaged catalogs. readOnly: # If true, the catalog will be read-only and all write operations will be blocked. properties: # List of configuration options. - key: # The key of the configuration option, e.g. "type". required: # Whether this is a required configuration option. description: # Description of the configuration option. defaultValue: # Default value if this option is not specified. - …
Patching the Gateway Docker Image¶
The following example shows the
Dockerfile to add a new catalog called
my-catalog to the VVP Gateway Docker image.
FROM registry.ververica.com/v2.10/vvp-gateway:2.10.2 COPY catalog-meta.yaml /vvp/sql/opt/catalogs/my-catalog/catalog-meta.yaml COPY my-catalog-0.1.jar /vvp/sql/opt/catalogs/my-catalog/my-catalog-0.1.jar
If you need to use a catalog for which Ververica Platform does not provide a packaged catalog, you can register a custom catalog. Custom catalogs are namespaced resources, and unlike packaged catalogs can only be used in namespaces in which they have been registered.
A custom catalog is defined by one or more JAR files which include all required dependencies. These files must either be located in the Universal Blob Storage or at HTTP locations which are accessible by Ververica Platform.
Every catalog has a unique identifier defined by the catalog’s implementation of the factory class
org.apache.flink.table.factories.CatalogFactory). This identifier is referenced in the
WITH clause of a
CREATE CATALOG statement using the
Since catalog identifiers must be unique, it is not possible to add a custom catalog if there exists another catalog (packaged or custom) with the same identifier already in the same namespace.
In case a Ververica Platform upgrade adds a new packaged catalog with the same identifier as a previously existing custom catalog, the custom catalog will shadow the newly added packaged catalog. This ensures that the behavior of existing queries does not change.
You can create, update and delete custom catalogs using the corresponding REST endpoints or through the Ververica Platform UI. The UI supports uploading JAR files to Universal Blob Storage and extracting available catalog implementations from the uploaded files.
Apache Hive® Catalog¶
Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL. The following section describes how a Hive catalog can be added after providing it as a custom catalog in Ververica Platform.
In order to create a Hive Catalog, you first need to make your
hive-site.xml available to Ververica Platform by mounting it into the Ververica Platform containers.
For example, you can store your
hive-site.xml in a Kubernetes ConfigMap,
apiVersion: v1 kind: ConfigMap metadata: name: hive-metastore-config-client namespace: vvp labels: app.kubernetes.io/name: vvp data: hive-site.xml: | <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hive-metastore.hive.svc:9083</value> </property> ... </configuration>
that you then mount into Ververica Platform via the corresponding Helm chart values.
# values.yaml volumeMounts: - name: hive-site mountPath: /etc/hive volumes: - name: hive-site configMap: name: hive-metastore-config-client
Afterwards, you can create a Hive Catalog via the
CREATE CATALOG DDL statement.
CREATE CATALOG hive_metastore WITH ( 'type' = 'hive', -- mandatory 'hive-conf-dir' = '/etc/hive', -- mandatory 'hive-version' = '3.1.2' -- mandatory )
Ververica Platform supports Amazon S3 (via Hadoop’s S3 filesystem implementation
s3a) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box.
If you are using a different storage layer, please reach out.
- Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
- Hive’s SQL dialect is not supported.
Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.
Registers a new catalog to the namespace.
CREATE CATALOG catalog_name WITH (key1=val1, key2=val2, ...)
The name of the new catalog must comply with Ververica Platform’s resource name rules.
For example, capital letters and the underscore (
_) character are not allowed in catalog names.
The type of the catalog must be specified using the
For example, a new VVP Catalog named
mycatalog with a default database called
main is created with the following statement:
CREATE CATALOG mycatalog WITH ( 'type' = 'vvp', 'defaultDatabase' = 'main');
See Packaged Catalogs for a list of catalog types which come bundled with Ververica Platform.
Unregisters the catalog from the namespace. The metadata of the catalog (databases, tables, views, functions, etc.) is not deleted.
DROP CATALOG [IF EXISTS] catalog_name
The built-in catalog
vvp of a namespace cannot be dropped.
Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.
Creates a new database in a catalog.
CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name [COMMENT database_comment] [WITH (key1=val1, key2=val2, ...)]
Sets or overrides one ore more properties of an existing database.
ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)
Drops a database from a catalog.
DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]
- Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail.This is the default behavior.
- All tables and functions of a database are dropped before the database itself is dropped.