Catalogs & Databases

Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.

A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is <catalog-name>`.`<database-name>`.`<table-name>. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the ` character.

Ververica Platform’s Built-In Catalog

Every namespace has its own built-in VVP Catalog named vvp with a default database named default. The default catalog of a namespace cannot be dropped.

note

Since default is a keyword in SQL, you need to escape the name of vvp’s default database as default.

Database DDL

Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.

CREATE DATABASE

Creates a new database in a catalog.

CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name
[COMMENT database_comment]
[WITH (key1=val1, key2=val2, ...)]

ALTER DATABASE

Sets or overrides one ore more properties of an existing database.

ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)

DROP DATABASE

Drops a database from a catalog.

DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]

RESTRICT

Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail. This is the default behavior.

CASCADE

All tables and functions of a database are dropped before the database itself is dropped.

Catalog DDL

Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.

CREATE CATALOG

Registers a new catalog to the namespace.

CREATE CATALOG catalog_name
WITH (key1=val1, key2=val2, ...)

The name of the new catalog must comply with Ververica Platform’s resource name rules. For example, capital letters and the underscore (_) character are not allowed in catalog names.

The type of the catalog must be specified using the 'type' property. For example, a new VVP Catalog named mycatalog with a default database called main is created with the following statement:

CREATE CATALOG mycatalog
WITH (
  'type' = 'vvp',
  'defaultDatabase' = 'main');

See Supported Catalogs for a list of all supported catalog types.

DROP CATALOG

Unregisters the catalog from the namespace. The metadata of the catalog (databases, tables, views, functions, etc.) is not deleted.

DROP CATALOG [IF EXISTS] catalog_name

note

The built-in catalog vvp of a namespace cannot be dropped.

Supported Catalogs

VVP Catalog

VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform’s persistence layer.

A VVP Catalog mycatalog with a default database mydb is created with the following DDL statement:

CREATE CATALOG mycatalog
WITH (
  'type' = 'vvp',             -- mandatory
  'defaultDatabase' = 'main'  -- mandatory
);

VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.

The following database objects are currently supported by VVP Catalog:

Databases

See Database DDL to learn how to create, alter, and drop databases.

Tables

See Tables & Views to learn how to create, alter, and drop tables.

User-Defined Functions

See Functions to learn how to implement and manage user-defined functions.

note

User-defined functions can only be registered in VVP Catalogs.

Not supported features at this point are:

Views
Table Partitions
Table, Column, and Partition Statistics

We will add these features in future releases depending on user feedback.

External Catalogs

External catalogs provide access to data stored in external storage systems via Flink SQL without the need to explicitly create tables. For that, an external catalog converts the metadata provided by the external system into Flink catalog metadata and makes it accessible to Apache Flink®.

Ververica Platform packages the following external catalogs:

Apache Hive® Catalog

Support for additional packaged catalogs, like Confluent® schema registery and JDBC, and custom external catalogs is planned in an upcoming release.

Apache Hive® Catalog

Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL.

In order to create a Hive Catalog, you first need to make your hive-site.xml available to Ververica Platform by mounting it into the Ververica Platform containers. For example, you can store your hive-site.xml in a Kubernetes ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hive-metastore-config-client
  namespace: vvp
  labels:
    app.kubernetes.io/name: vvp
data:
  hive-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
       <property>
         <name>hive.metastore.uris</name>
         <value>thrift://hive-metastore.hive.svc:9083</value>
       </property>
       ...
    </configuration>

that you then mount into Ververica Platform via the corresponding Helm chart values.

# values.yaml
volumeMounts:
  - name: hive-site
    mountPath: /etc/hive

volumes:
  - name: hive-site
    configMap:
      name: hive-metastore-config-client

Afterwards, you can create a Hive Catalog via the CREATE CATALOG DDL statement.

CREATE CATALOG hive_metastore
WITH (
  'type' = 'hive',                  -- mandatory
  'hive-conf-dir' = '/etc/hive',    -- mandatory
  'hive-version' = '3.1.2'          -- mandatory
)

Ververica Platform supports Amazon S3 (via Hadoop’s S3 filesystem implementation s3a) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box. If you are using a different storage layer, please reach out.

note

Please see Apache Hive® for how to read from or write into Hive tables via Flink SQL.

Limitations

Ververica Platform ships with Hive 3.1.2 only. To our knowledge this version is compatible with Hive 3.0.0 to 3.1.2. More versions will be supported in the future.
Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
Hive’s SQL dialect is not supported.
The Hive Catalog is read-only. ALTER TABLE or CREATE TABLE statements are not supported. If needed, table properties can be added on-demand via dynamic table options.

Ververica Platform’s Built-In Catalog​

Database DDL​

CREATE DATABASE​

ALTER DATABASE​

DROP DATABASE​

RESTRICT​

CASCADE​

Catalog DDL​

CREATE CATALOG​

DROP CATALOG​

Supported Catalogs​

VVP Catalog​

Databases​

Tables​

User-Defined Functions​

External Catalogs​

Apache Hive® Catalog​

Limitations​

Ververica Platform’s Built-In Catalog

Database DDL

CREATE DATABASE

ALTER DATABASE

DROP DATABASE

RESTRICT

CASCADE

Catalog DDL

CREATE CATALOG

DROP CATALOG

Supported Catalogs

VVP Catalog

Databases

Tables

User-Defined Functions

External Catalogs

Apache Hive® Catalog

Limitations