Skip to main content

Reference Guide: Built-in Formats

Apache Flink provides several built-in formats for reading and writing data from/to various sources and sinks (also known as destinations). These formats make it easier to work with different data serialization and deserialization mechanisms when processing data streams. In this section, we will provide links to the official Flink documentation for some of the most commonly used built-in formats in Apache Flink. This will help you explore these formats and their usage in more detail.

Avro

A binary serialization format that is compact, fast, and suitable for both data storage and data exchange between Flink jobs.

Confluent Avro

A format used in conjunction with Apache Avro and Confluent Schema Registry for managing and evolving Avro schemas in Apache Kafka. Confluent Avro provides an Avro serialization and deserialization schema that integrates with the Confluent Schema Registry.

Canal

Canal is an open-source, MySQL-compatible database server that provides a change data capture (CDC) solution for MySQL and MariaDB. Flink’s Canal format allows you to consume CDC events from Canal-enabled databases directly.

CSV

A popular plain text format for structured data, where each record consists of one or more fields separated by a delimiter.

Debezium

Debezium is an open-source platform that provides a change data capture (CDC) solution for various databases like MySQL, PostgreSQL, MongoDB, and more. Flink’s Debezium format allows you to consume CDC events from Debezium-enabled databases directly.

JSON

A widely used data interchange format that offers a lightweight and human-readable representation of structured data.

Maxwell

Maxwell is another change data capture (CDC) solution, primarily designed for MySQL. It captures row-level changes in the database and converts them into JSON format, which can be consumed by various downstream systems, including Apache Flink.

Orc

A columnar storage file format that provides an efficient and high-performing solution for storing and processing large amounts of data.

Parquet

A columnar storage file format optimized for use with Hadoop and other distributed data processing frameworks.

Raw

The Raw format is a special format in Flink that allows you to read and write raw, unprocessed bytes. This format can be useful when you want to work with binary data without parsing or serialization.

note

Most of documentation about built-in connectors comes from the official Apache Flink® documentation.

Refer to the Credits page for more information.