Apache Iceberg
On this page
This topic describes how to use the Apache Iceberg connector.
Background Information
Apache Iceberg is an open table format for data lakes. You can use Apache Iceberg to quickly build your own data lake storage service on Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (S3). Then, you can use a computing engine of the open source big data ecosystem, such as Apache Flink, Apache Spark, Apache Hive, or Apache Presto, to analyze data in your data lake.
Features
Apache Iceberg provides the following core capabilities:
- Builds a low-cost lightweight data lake storage service based on HDFS or S3.
- Provides comprehensive atomicity, consistency, isolation, durability (ACID) semantics.
- Supports historical version backtracking.
- Supports efficient data filtering.
- Supports schema evolution.
- Supports partition evolution.
You can use the efficient fault tolerance and stream processing capabilities of Flink to import a large amount of behavioral data in logs into an Apache Iceberg data lake in real time. Then, you can use Flink or another analytics engine to extract the value of your data.
Limits
Only Ververica Cloud that uses VERA 1.0.3 and later supports the Apache Iceberg connector. The Apache Iceberg connector supports only the Apache Iceberg table format of version 1. For more information, see Iceberg Table Spec.
Syntax
1 CREATE TABLE iceberg_table (
2 id BIGINT,
3 data STRING
4) WITH (
5 'connector' = 'iceberg',
6 ...
7);Parameters in the WITH Clause
Common Parameters
Parameters Only for Result Tables
Data Type Mappings
Sample Code
- Create an S3 bucket.
- Externally create an Iceberg catalog using the above S3 location.
- Add a Glue policy to your IAM role (if using Glue as backend catalog).
- Run the scripts.
Sample Code for an Apache Iceberg Source Read Script
1CREATE TEMPORARY TABLE iceberg_catalog_table (
2 id INT,
3 data STRING
4) WITH (
5 'connector'='iceberg',
6 'catalog-name'='iceberg_db',
7 'catalog-database'='iceberg_db',
8 'catalog-table'='sample',
9 'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
10 'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
11 'warehouse'='s3://iceberg-testing-us-west-1/catalog/'
12);
13
14CREATE TEMPORARY TABLE sink (
15 id INT,
16 data STRING
17) WITH (
18 'connector' = 'blackhole'
19);
20
21INSERT INTO sink SELECT id, data
22FROM iceberg_catalog_table;Sample Code for an Apache Iceberg Sink Write Script
1CREATE TEMPORARY TABLE datagen (
2 name VARCHAR,
3 age INT
4) WITH (
5 'connector' = 'datagen'
6);
7
8CREATE TEMPORARY TABLE iceberg_catalog_table (
9 id INT,
10 data STRING
11) WITH (
12 'connector'='iceberg',
13 'catalog-name'='iceberg_db',
14 'catalog-database'='iceberg_db',
15 'catalog-table'='sample',
16 'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
17 'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
18 'warehouse'='s3://iceberg-testing-us-west-1/catalog/',
19 's3.staging-dir'='s3://iceberg-testing-us-west-1/staging/'
20);
21
22INSERT INTO iceberg_catalog_table SELECT age, name
23FROM datagen;