Catalogs & Databases¶
Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.
A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is
`<catalog-name>`.`<database-name>`.`<table-name>`. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the
default is a keyword in SQL, you need to escape the name of
vvp’s default database as
Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.
Creates a new database in a catalog.
CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name [COMMENT database_comment] [WITH (key1=val1, key2=val2, ...)]
Sets or overrides one ore more properties of an existing database.
ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)
Drops a database from a catalog.
DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]
- Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail.This is the default behavior.
- All tables and functions of a database are dropped before the database itself is dropped.
Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.
Registers a new catalog to the namespace.
CREATE CATALOG catalog_name WITH (key1=val1, key2=val2, ...)
The name of the new catalog must comply with Ververica Platform’s resource name rules.
For example, capital letters and the underscore (
_) character are not allowed in catalog names.
The type of the catalog must be specified using the
For example, a new VVP Catalog named
mycatalog with a default database called
main is created with the following statement:
CREATE CATALOG mycatalog WITH ( 'type' = 'vvp', 'defaultDatabase' = 'main');
See Supported Catalogs for a list of all supported catalog types.
VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform’s persistence layer.
A VVP Catalog
mycatalog with a default database
mydb is created with the following DDL statement:
CREATE CATALOG mycatalog WITH ( 'type' = 'vvp', -- mandatory 'defaultDatabase' = 'main' -- mandatory );
VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.
The following database objects are currently supported by VVP Catalog:
User-defined functions can only be registered in VVP Catalogs.
Not supported features at this point are:
- Table Partitions
- Table, Column, and Partition Statistics
We will add these features in future releases depending on user feedback.
External catalogs provide access to data stored in external storage systems via Flink SQL without the need to explicitly create tables. For that, an external catalog converts the metadata provided by the external system into Flink catalog metadata and makes it accessible to Apache Flink®.
Ververica Platform packages the following external catalogs:
Support for additional packaged catalogs, like Confluent® schema registery and JDBC, and custom external catalogs is planned in an upcoming release.
Apache Hive® Catalog¶
Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL.
In order to create a Hive Catalog, you first need to make your
hive-site.xml available to Ververica Platform by mounting it into the Ververica Platform containers.
For example, you can store your
hive-site.xml in a Kubernetes ConfigMap,
apiVersion: v1 kind: ConfigMap metadata: name: hive-metastore-config-client namespace: vvp labels: app.kubernetes.io/name: vvp data: hive-site.xml: | <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hive-metastore.hive.svc:9083</value> </property> ... </configuration>
that you then mount into Ververica Platform via the corresponding Helm Chart values.
# values.yaml volumeMounts: - name: hive-site mountPath: /etc/hive volumes: - name: hive-site configMap: name: hive-metastore-config-client
Afterwards, you can create a Hive Catalog via the
CREATE CATALOG DDL statement.
CREATE CATALOG hive_metastore WITH ( 'type' = 'hive', -- mandatory 'hive-conf-dir' = '/etc/hive', -- mandatory 'hive-version' = '3.1.2' -- mandatory )
Ververica Platform supports Amazon S3 (via Hadoop’s S3 filesystem implementation
s3a) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box.
If you are using a different storage layer, please reach out.
Please see Apache Hive® for how to read from or write into Hive tables via Flink SQL.
- Ververica Platform ships with Hive 3.1.2 only. To our knowledge this version is compatible with Hive 3.0.0 to 3.1.2. More versions will be supported in the future.
- Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
- Hive’s SQL dialect is not supported.
- The Hive Catalog is read-only.
CREATE TABLEstatements are not supported. If needed, table properties can be added on-demand via dynamic table options.