Catalogs & Databases¶
Catalogs are used to store all metadata about database objects, such as databases, tables, table attributes, functions, and views. The catalog metadata is accessed when a SQL query is parsed, validated, and optimized. Only database objects which are registered in a catalog can be referenced in SQL queries.
A catalog object can be addressed with a fully or a partially specified name. The fully-specified name of a table is `<catalog-name>`.`<database-name>`.`<table-name>`
. For a partially-specified name the prefix of catalog and/or database name can be omitted if those are currently selected. Catalog, database, table, or function names that do not comply with the SQL naming specification need to be quoted with the `
character.
Ververica Platform’s Built-In Catalog¶
Every namespace has its own built-in VVP Catalog named vvp
with a default database named default
. The default catalog of a namespace cannot be dropped.
Note
Since default
is a keyword in SQL, you need to escape the name of vvp
’s default database as `default`
.
Database DDL¶
Databases are used to organize tables, views, and functions in a catalog. You can create, alter, or drop databases.
CREATE DATABASE¶
Creates a new database in a catalog.
CREATE DATABASE [IF NOT EXISTS] [catalog_name.]db_name
[COMMENT database_comment]
[WITH (key1=val1, key2=val2, ...)]
ALTER DATABASE¶
Sets or overrides one ore more properties of an existing database.
ALTER DATABASE [catalog_name.]db_name SET (key1=val1, key2=val2, ...)
DROP DATABASE¶
Drops a database from a catalog.
DROP DATABASE [IF EXISTS] [catalog_name.]db_name [ (RESTRICT | CASCADE) ]
- RESTRICT
- Attempting to drop a non-empty database, i.e., a database that contains tables or functions, will fail.This is the default behavior.
- CASCADE
- All tables and functions of a database are dropped before the database itself is dropped.
Catalog DDL¶
Catalogs can be registered to a namespace. Once a catalog is registered, its tables can be accessed by SQL queries.
CREATE CATALOG¶
Registers a new catalog to the namespace.
CREATE CATALOG catalog_name
WITH (key1=val1, key2=val2, ...)
The name of the new catalog must comply with Ververica Platform’s resource name rules.
For example, capital letters and the underscore (_
) character are not allowed in catalog names.
The type of the catalog must be specified using the 'type'
property.
For example, a new VVP Catalog named mycatalog
with a default database called main
is created with the following statement:
CREATE CATALOG mycatalog
WITH (
'type' = 'vvp',
'defaultDatabase' = 'main');
See Supported Catalogs for a list of all supported catalog types.
DROP CATALOG¶
Unregisters the catalog from the namespace. The metadata of the catalog (databases, tables, views, functions, etc.) is not deleted.
DROP CATALOG [IF EXISTS] catalog_name
Note
The built-in catalog vvp
of a namespace cannot be dropped.
Supported Catalogs¶
VVP Catalog¶
VVP Catalog is a catalog implementation that is integrated with Ververica Platform. It does not require additional setup steps and does not have any additional dependencies. VVP Catalog stores its metadata in Ververica Platform’s persistence layer.
A VVP Catalog mycatalog
with a default database mydb
is created with the following DDL statement:
CREATE CATALOG mycatalog
WITH (
'type' = 'vvp', -- mandatory
'defaultDatabase' = 'main' -- mandatory
);
VVP Catalog ensures that the configured default database is always present. The default database can be deleted but is automatically re-created as soon as the catalog is accessed again.
The following database objects are currently supported by VVP Catalog:
- Databases
- See Database DDL to learn how to create, alter, and drop databases.
- Tables
- See Tables & Views to learn how to create, alter, and drop tables.
- User-Defined Functions
- See Functions to learn how to implement and manage user-defined functions.
Note
User-defined functions can only be registered in VVP Catalogs.
Not supported features at this point are:
- Views
- Table Partitions
- Table, Column, and Partition Statistics
We will add these features in future releases depending on user feedback.
External Catalogs¶
External catalogs provide access to data stored in external storage systems via Flink SQL without the need to explicitly create tables. For that, an external catalog converts the metadata provided by the external system into Flink catalog metadata and makes it accessible to Apache Flink®.
Ververica Platform packages the following external catalogs:
Support for additional packaged catalogs, like Confluent® schema registery and JDBC, and custom external catalogs is planned in an upcoming release.
Apache Hive® Catalog¶
Apache Flink® Hive Catalog imports table metadata directly from your Apache Hive® Metastore. Once configured, you can read from and write into Hive tables with Flink SQL.
In order to create a Hive Catalog, you first need to make your hive-site.xml
available to Ververica Platform by mounting it into the Ververica Platform containers.
For example, you can store your hive-site.xml
in a Kubernetes ConfigMap,
apiVersion: v1
kind: ConfigMap
metadata:
name: hive-metastore-config-client
namespace: vvp
labels:
app.kubernetes.io/name: vvp
data:
hive-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-metastore.hive.svc:9083</value>
</property>
...
</configuration>
that you then mount into Ververica Platform via the corresponding Helm chart values.
# values.yaml
volumeMounts:
- name: hive-site
mountPath: /etc/hive
volumes:
- name: hive-site
configMap:
name: hive-metastore-config-client
Afterwards, you can create a Hive Catalog via the CREATE CATALOG
DDL statement.
CREATE CATALOG hive_metastore
WITH (
'type' = 'hive', -- mandatory
'hive-conf-dir' = '/etc/hive', -- mandatory
'hive-version' = '3.1.2' -- mandatory
)
Ververica Platform supports Amazon S3 (via Hadoop’s S3 filesystem implementation s3a
) as well as Apache Hadoop® HDFS as Hive storage layer out-of-the-box.
If you are using a different storage layer, please reach out.
See also
Please see Apache Hive® for how to read from or write into Hive tables via Flink SQL.
Limitations¶
- Ververica Platform ships with Hive 3.1.2 only. To our knowledge this version is compatible with Hive 3.0.0 to 3.1.2. More versions will be supported in the future.
- Hive built-in functions and user-defined functions stored in Hive are not supported by Ververica Platform yet.
- Hive’s SQL dialect is not supported.
- The Hive Catalog is read-only.
ALTER TABLE
orCREATE TABLE
statements are not supported. If needed, table properties can be added on-demand via dynamic table options.