Skip to main content
Version: 2.12

Catalogs & Databases

Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.

A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is <catalog-name>`.`<database-name>`.`<table-name>. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the backtick (` ) character.

Built-In Catalog

Every namespace has its own built-in VVP Catalog named vvp with a default database named default. The default catalog of a namespace cannot be dropped.

note

Since default is a keyword in SQL, you need to escape the name of vvp's default database as default.

Packaged Catalogs

Ververica Platform packages the following catalogs:

Support for additional packaged catalogs, like Confluent® schema registry, is planned in an upcoming release.

VVP Catalog

VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform's persistence layer.

A VVP Catalog mycatalog with a default database mydb is created with the following DDL statement:

CREATE CATALOG mycatalog
WITH (
'type' = 'vvp', -- mandatory
'defaultDatabase' = 'main' -- mandatory
);

VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.

The following database objects are currently supported by VVP Catalog:

  • Databases

See Database DDL to learn how to create, alter, and drop databases.

  • Tables See Tables & Views to learn how to create, alter, and drop tables.
  • User-Defined Functions See Functions to learn how to implement and manage user-defined functions.
note

User-defined functions can only be registered in VVP Catalogs.

Not supported features at this point are:

  • Table Partitions
  • Table, Column, and Partition Statistics

We will add these features in future releases depending on user feedback.

JDBC Catalog

With the JDBC catalog you can connect to a remote database and expose its tables in Apache Flink® SQL.

note

Ververica Platform currently only ships with the JDBC driver for Postgres. Other drivers are not yet supported through the packaged connector.

CREATE CATALOG pg_catalog WITH (
'type' = 'jdbc',
'base-url' = 'jdbc:postgresql://…',
'default-database' = '…',
'username' = '…',
'password' = '…'
)

See the documentation for more information on the JDBC catalog and available options.

Manage Packaged Catalogs

Packaged catalogs consist of

  • A YAML metadata file providing Ververica Platform with structured information about the catalog.
  • The JAR file(s) including the catalog, its factory, and all required dependencies.

These files are located in the Ververica Platform Gateway docker image in a dedicated directory for each catalog under /vvp/sql/opt/catalogs.

You can add your own packaged catalog, making it available in all Ververica Platform namespaces, by patching the Docker image. Simply add a new folder for the catalog under the directory given above and include the metadata file as well as all relevant JAR files.

Similarly, you can remove or modify packaged catalogs by removing over replacing the contents of its folder.

note

Packaged catalogs are not per-namespace resources. Any modifications will affect all namespaces.

Catalog Metadata File

The metadata file of a catalog must be named catalog-meta.yaml and has the following schema:

catalog:
type: # Factory identifier of the catalog, e.g. "jdbc". This must be the same as the folder name.
packaged: # This must be set to true for packaged catalogs.
readOnly: # If true, the catalog will be read-only and all write operations will be blocked.
properties: # List of configuration options.
- key: # The key of the configuration option, e.g. "type".
required: # Whether this is a required configuration option.
description: # Description of the configuration option.
defaultValue: # Default value if this option is not specified.
-

Patching the Gateway Docker Image

The following example shows the Dockerfile to add a new catalog called my-catalog to the VVP Gateway Docker image.

FROM registry.ververica.com/v2.10/vvp-gateway:2.10.3
COPY catalog-meta.yaml /vvp/sql/opt/catalogs/my-catalog/catalog-meta.yaml
COPY my-catalog-0.1.jar /vvp/sql/opt/catalogs/my-catalog/my-catalog-0.1.jar

Custom Catalogs

If you need to use a catalog for which Ververica Platform does not provide a packaged catalog, you can register a custom catalog. Custom catalogs are namespaced resources, and unlike packaged catalogs can only be used in namespaces in which they have been registered.

A custom catalog is defined by one or more JAR files which include all required dependencies. These files must either be located in the Universal Blob Storage or at HTTP locations which are accessible by Ververica Platform.

Every catalog has a unique identifier defined by the catalog's implementation of the factory class (see org.apache.flink.table.factories.CatalogFactory). This identifier is referenced in the WITH clause of a CREATE CATALOG statement using the type property.

Since catalog identifiers must be unique, it is not possible to add a custom catalog if there exists another catalog (packaged or custom) with the same identifier already in the same namespace.

note

In case a Ververica Platform upgrade adds a new packaged catalog with the same identifier as a previously existing custom catalog, the custom catalog will shadow the newly added packaged catalog. This ensures that the behavior of existing queries does not change.

You can create, update and delete custom catalogs using the corresponding REST endpoints or through the Ververica Platform UI. The UI supports uploading JAR files to Universal Blob Storage and extracting available catalog implementations from the uploaded files.

Apache Hive® Catalog

Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL. The following section describes how a Hive catalog can be added after providing it as a custom catalog in Ververica Platform.

In order to create a Hive Catalog, you first need to make your hive-site.xml available to Ververica Platform by mounting it into the Ververica Platform containers. For example, you can store your hive-site.xml in a Kubernetes ConfigMap,

apiVersion: v1
kind: ConfigMap
metadata:
name: hive-metastore-config-client
namespace: vvp
labels:
app.kubernetes.io/name: vvp
data:
hive-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-metastore.hive.svc:9083</value>
</property>
...
</configuration>

that you then mount into Ververica Platform via the corresponding Helm chart values.

# values.yaml
volumeMounts:
- name: hive-site
mountPath: /etc/hive

volumes:
- name: hive-site
configMap:
name: hive-metastore-config-client

Afterwards, you can create a Hive Catalog via the CREATE CATALOG DDL statement.

CREATE CATALOG hive_metastore
WITH (
'type' = 'hive', -- mandatory
'hive-conf-dir' = '/etc/hive', -- mandatory
'hive-version' = '3.1.2' -- mandatory
)

Ververica Platform supports Amazon S3 (via Hadoop's S3 filesystem implementation s3a) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box. If you are using a different storage layer, please reach out.

Limitations

  • Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
  • Hive's SQL dialect is not supported.

Catalog DDL

Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.

CREATE CATALOG

Registers a new catalog to the namespace.

CREATE CATALOG catalog_name
WITH (key1=val1, key2=val2, ...)

The name of the new catalog must comply with Ververica Platform's resource name rules. For example, capital letters and the underscore (_) character are not allowed in catalog names.

The type of the catalog must be specified using the 'type' property. For example, a new VVP Catalog named mycatalog with a default database called main is created with the following statement:

CREATE CATALOG mycatalog
WITH (
'type' = 'vvp',
'defaultDatabase' = 'main');

See Packaged Catalogs for a list of catalog types which come bundled with Ververica Platform.

DROP CATALOG

Unregisters the catalog from namespace. The metadata of the catalog (databases, tables, views, functions, etc.) is not deleted.

DROP CATALOG [IF EXISTS] catalog_name
note

The built-in catalog vvp of a namespace cannot be dropped.

Database DDL

Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.

DDL commands normally run live against the target database. In an environment where Auto-DDL is disallowed, for example when deploying to a production environment, disable automatic database updates and configure Ververica Platform to generate an offline changeset from which to create or update the database. See Configuring Offline DDL Updates, below.

CREATE DATABASE

Creates a new database in a catalog.

CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name
[COMMENT database_comment]
[WITH (key1=val1, key2=val2, ...)]

ALTER DATABASE

Sets or overrides one ore more properties of an existing database.

ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)

DROP DATABASE

Drops a database from a catalog.

DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]

RESTRICT Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail. This is the default behavior.

CASCADE All tables and functions of a database are dropped before the database itself is dropped.