Catalogs & Databases
Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.
A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is <catalog-name>`.`<database-name>`.`<table-name>
. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the backtick (` ) character.
Built-In Catalog
Every namespace has its own built-in VVP Catalog named vvp
with a default database named default
. The default catalog of a namespace cannot be dropped.
Since default
is a keyword in SQL, you need to escape the name of vvp
's default database as default
.
Packaged Catalogs
Ververica Platform packages the following catalogs:
Support for additional packaged catalogs, like Confluent® schema registry, is planned in an upcoming release.
VVP Catalog
VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform's persistence layer.
A VVP Catalog mycatalog
with a default database mydb
is created with the following DDL statement:
CREATE CATALOG mycatalog
WITH (
'type' = 'vvp', -- mandatory
'defaultDatabase' = 'main' -- mandatory
);
VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.
The following database objects are currently supported by VVP Catalog:
- Databases
See Database DDL to learn how to create, alter, and drop databases.
-
Tables See Tables & Views to learn how to create, alter, and drop tables.
-
User-Defined Functions See Functions to learn how to implement and manage user-defined functions.
User-defined functions can only be registered in VVP Catalogs.
Not supported features at this point are:
- Table Partitions
- Table, Column, and Partition Statistics
We will add these features in future releases depending on user feedback.
JDBC Catalog
With the JDBC catalog you can connect to a remote database and expose its tables in Apache Flink® SQL.
Ververica Platform currently only ships with the JDBC driver for Postgres. Other drivers are not yet supported through the packaged connector.
CREATE CATALOG pg_catalog WITH (
'type' = 'jdbc',
'base-url' = 'jdbc:postgresql://…',
'default-database' = '…',
'username' = '…',
'password' = '…'
)
See the documentation for more information on the JDBC catalog and available options.
Manage Packaged Catalogs
Packaged catalogs consist of
- A YAML metadata file providing Ververica Platform with structured information about the catalog.
- The JAR file(s) including the catalog, its factory, and all required dependencies.
These files are located in the Ververica Platform Gateway docker image in a dedicated directory for
each catalog under /vvp/sql/opt/catalogs
.
You can add your own packaged catalog, making it available in all Ververica Platform namespaces, by patching the Docker image. Simply add a new folder for the catalog under the directory given above and include the metadata file as well as all relevant JAR files.
Similarly, you can remove or modify packaged catalogs by removing over replacing the contents of its folder.
Packaged catalogs are not per-namespace resources. Any modifications will affect all namespaces.
Catalog Metadata File
The metadata file of a catalog must be named catalog-meta.yaml
and has the following schema:
catalog:
type: # Factory identifier of the catalog, e.g. "jdbc". This must be the same as the folder name.
packaged: # This must be set to true for packaged catalogs.
readOnly: # If true, the catalog will be read-only and all write operations will be blocked.
properties: # List of configuration options.
- key: # The key of the configuration option, e.g. "type".
required: # Whether this is a required configuration option.
description: # Description of the configuration option.
defaultValue: # Default value if this option is not specified.
- …
Patching the Gateway Docker Image
The following example shows the Dockerfile
to add a new catalog called my-catalog
to the VVP Gateway Docker image.
FROM registry.ververica.com/v2.10/vvp-gateway:2.10.3
COPY catalog-meta.yaml /vvp/sql/opt/catalogs/my-catalog/catalog-meta.yaml
COPY my-catalog-0.1.jar /vvp/sql/opt/catalogs/my-catalog/my-catalog-0.1.jar
Custom Catalogs
If you need to use a catalog for which Ververica Platform does not provide a packaged catalog, you can register a custom catalog. Custom catalogs are namespaced resources, and unlike packaged catalogs can only be used in namespaces in which they have been registered.
A custom catalog is defined by one or more JAR files which include all required dependencies. These files must either be located in the Universal Blob Storage or at HTTP locations which are accessible by Ververica Platform.
Every catalog has a unique identifier defined by the catalog's implementation of the factory class
(see org.apache.flink.table.factories.CatalogFactory
). This identifier is referenced in the
WITH
clause of a CREATE CATALOG
statement using the type
property.
Since catalog identifiers must be unique, it is not possible to add a custom catalog if there exists another catalog (packaged or custom) with the same identifier already in the same namespace.
In case a Ververica Platform upgrade adds a new packaged catalog with the same identifier as a previously existing custom catalog, the custom catalog will shadow the newly added packaged catalog. This ensures that the behavior of existing queries does not change.
You can create, update and delete custom catalogs using the corresponding REST endpoints or through the Ververica Platform UI. The UI supports uploading JAR files to Universal Blob Storage and extracting available catalog implementations from the uploaded files.
Apache Hive® Catalog
Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL. The following section describes how a Hive catalog can be added after providing it as a custom catalog in Ververica Platform.
In order to create a Hive Catalog, you first need to make your hive-site.xml
available to Ververica Platform by mounting it into the Ververica Platform containers.
For example, you can store your hive-site.xml
in a Kubernetes ConfigMap,
apiVersion: v1
kind: ConfigMap
metadata:
name: hive-metastore-config-client
namespace: vvp
labels:
app.kubernetes.io/name: vvp
data:
hive-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-metastore.hive.svc:9083</value>
</property>
...
</configuration>
that you then mount into Ververica Platform via the corresponding Helm chart values.
# values.yaml
volumeMounts:
- name: hive-site
mountPath: /etc/hive
volumes:
- name: hive-site
configMap:
name: hive-metastore-config-client
Afterwards, you can create a Hive Catalog via the CREATE CATALOG
DDL statement.
CREATE CATALOG hive_metastore
WITH (
'type' = 'hive', -- mandatory
'hive-conf-dir' = '/etc/hive', -- mandatory
'hive-version' = '3.1.2' -- mandatory
)
Ververica Platform supports Amazon S3 (via Hadoop's S3 filesystem implementation s3a
) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box.
If you are using a different storage layer, please reach out.
Limitations
- Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
- Hive's SQL dialect is not supported.
Catalog DDL
Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.
CREATE CATALOG
Registers a new catalog to the namespace.
CREATE CATALOG catalog_name
WITH (key1=val1, key2=val2, ...)
The name of the new catalog must comply with Ververica Platform's resource name rules.
For example, capital letters and the underscore (_
) character are not allowed in catalog names.
The type of the catalog must be specified using the 'type'
property.
For example, a new VVP Catalog named mycatalog
with a default database called main
is created with the following statement:
CREATE CATALOG mycatalog
WITH (
'type' = 'vvp',
'defaultDatabase' = 'main');
See Packaged Catalogs for a list of catalog types which come bundled with Ververica Platform.
DROP CATALOG
Unregisters the catalog from namespace. The metadata of the catalog (databases, tables, views, functions, etc.) is not deleted.
DROP CATALOG [IF EXISTS] catalog_name
The built-in catalog vvp
of a namespace cannot be dropped.
Database DDL
Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.
DDL commands normally run live against the target database. In an environment where Auto-DDL is disallowed, for example when deploying to a production environment, disable automatic database updates and configure Ververica Platform to generate an offline changeset from which to create or update the database. See Configuring Offline DDL Updates, below.
CREATE DATABASE
Creates a new database in a catalog.
CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name
[COMMENT database_comment]
[WITH (key1=val1, key2=val2, ...)]
ALTER DATABASE
Sets or overrides one ore more properties of an existing database.
ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)
DROP DATABASE
Drops a database from a catalog.
DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]
RESTRICT Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail. This is the default behavior.
CASCADE All tables and functions of a database are dropped before the database itself is dropped.