Catalogs & Databases
Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.
A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is <catalog-name>`.`<database-name>`.`<table-name>
. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the ` character.
Ververica Platform’s Built-In Catalog
Every namespace has its own built-in VVP Catalog named vvp
with a default database named default
. The default catalog of a namespace cannot be dropped.
Since default
is a keyword in SQL, you need to escape the name of vvp
’s default database as default
.
Database DDL
Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.
CREATE DATABASE
Creates a new database in a catalog.
CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name
[COMMENT database_comment]
[WITH (key1=val1, key2=val2, ...)]
ALTER DATABASE
Sets or overrides one ore more properties of an existing database.
ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)
DROP DATABASE
Drops a database from a catalog.
DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]
RESTRICT
Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail. This is the default behavior.
CASCADE
All tables and functions of a database are dropped before the database itself is dropped.
Catalog DDL
Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.
CREATE CATALOG
Registers a new catalog to the namespace.
CREATE CATALOG catalog_name
WITH (key1=val1, key2=val2, ...)
The name of the new catalog must comply with Ververica Platform’s resource name rules. For example, capital letters and the underscore (_) character are not allowed in catalog names.
The type of the catalog must be specified using the 'type'
property. For example, a new VVP Catalog named mycatalog
with a default database called main
is created with the following statement:
CREATE CATALOG mycatalog
WITH (
'type' = 'vvp',
'defaultDatabase' = 'main');
See Supported Catalogs for a list of all supported catalog types.
DROP CATALOG
Unregisters the catalog from the namespace. The metadata of the catalog (databases, tables, views, functions, etc.) is not deleted.
DROP CATALOG [IF EXISTS] catalog_name
The built-in catalog vvp
of a namespace cannot be dropped.
Supported Catalogs
VVP Catalog
VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform’s persistence layer.
A VVP Catalog mycatalog
with a default database mydb is created with the following DDL statement:
CREATE CATALOG mycatalog
WITH (
'type' = 'vvp', -- mandatory
'defaultDatabase' = 'main' -- mandatory
);
VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.
The following database objects are currently supported by VVP Catalog:
Databases
See Database DDL to learn how to create, alter, and drop databases.
Tables
See Tables & Views to learn how to create, alter, and drop tables.
User-Defined Functions
See Functions to learn how to implement and manage user-defined functions.
User-defined functions can only be registered in VVP Catalogs.
Not supported features at this point are:
- Views
- Table Partitions
- Table, Column, and Partition Statistics
We will add these features in future releases depending on user feedback.
External Catalogs
External catalogs provide access to data stored in external storage systems via Flink SQL without the need to explicitly create tables. For that, an external catalog converts the metadata provided by the external system into Flink catalog metadata and makes it accessible to Apache Flink®.
Ververica Platform packages the following external catalogs:
- Apache Hive® Catalog
Support for additional packaged catalogs, like Confluent® schema registery and JDBC, and custom external catalogs is planned in an upcoming release.
Apache Hive® Catalog
Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL.
In order to create a Hive Catalog, you first need to make your hive-site.xml
available to Ververica Platform by mounting it into the Ververica Platform containers. For example, you can store your hive-site.xml
in a Kubernetes ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: hive-metastore-config-client
namespace: vvp
labels:
app.kubernetes.io/name: vvp
data:
hive-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-metastore.hive.svc:9083</value>
</property>
...
</configuration>
that you then mount into Ververica Platform via the corresponding Helm chart values.
# values.yaml
volumeMounts:
- name: hive-site
mountPath: /etc/hive
volumes:
- name: hive-site
configMap:
name: hive-metastore-config-client
Afterwards, you can create a Hive Catalog via the CREATE CATALOG
DDL statement.
CREATE CATALOG hive_metastore
WITH (
'type' = 'hive', -- mandatory
'hive-conf-dir' = '/etc/hive', -- mandatory
'hive-version' = '3.1.2' -- mandatory
)
Ververica Platform supports Amazon S3 (via Hadoop’s S3 filesystem implementation s3a
) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box. If you are using a different storage layer, please reach out.
Please see Apache Hive® for how to read from or write into Hive tables via Flink SQL.
Limitations
- Ververica Platform ships with Hive 3.1.2 only. To our knowledge this version is compatible with Hive 3.0.0 to 3.1.2. More versions will be supported in the future.
- Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
- Hive’s SQL dialect is not supported.
- The Hive Catalog is read-only.
ALTER TABLE
orCREATE TABLE
statements are not supported. If needed, table properties can be added on-demand via dynamic table options.