Skip to main content

Built-in Connectors

Apache Flink provides a variety of built-in connectors to facilitate the integration of Flink with different data sources and sinks (also called destinations). These connectors make it easy to read and write data from/to various systems in a scalable and fault-tolerant manner. In this section, we will introduce some of the most commonly used built-in connectors in Apache Flink.

note

Using the Console Network Detection feature, an IP address or a domain name can be used to check whether the running environment of a fully managed Flink deployment is connected to the upstream and downstream systems. See the FAQ section for more information.

Apache Paimonโ€‹

At its core, Apache Paimon is a dynamic data lake storage, streamlined for both streaming and batch data processing. With a knack for supporting high-throughput data writing and offering low-latency data querying, it is tailored for compatibility with Flink-based Ververica Cloud. If you're aiming to set up your data lake storage on Hadoop Distributed File System (HDFS) or Ververica Cloud, Apache Paimon is your go-to solution.

Read more here

Apache Kafkaโ€‹

Apache Kafka is a distributed streaming platform designed for high-throughput, fault-tolerant, and scalable data streaming. Flinkโ€™s Kafka connector allows you to consume and produce data from and to Kafka topics.

Read more here

Upsert Kafka SQL Connectorโ€‹

The Upsert Kafka SQL Connector allows Apache Flink to integrate with Apache Kafka for reading and writing data using upsert semantics. This is particularly useful when working with changelog streams or streaming upserts, where each record represents an update or deletion of a previous record based on a primary key.

Read more here

Amazon Kinesis Data Streamsโ€‹

Amazon Kinesis Data Streams is a managed, real-time data streaming service provided by Amazon Web Services (AWS). Flinkโ€™s Kinesis connector enables you to consume and produce data from and to Kinesis data streams.

Read more here

DataGenโ€‹

The DataGen connector in Apache Flink allows you to create tables with in-memory data generation, which is particularly useful for developing and testing queries locally without the need to access external systems such as Kafka. DataGen tables can include computed column syntax for flexible record generation.

Read more here

Fakerโ€‹

The Faker connector leverages the popular Java Faker library to generate random data based on predefined patterns. This allows you to create tables with data that closely resembles real-world data, enabling you to develop and test your Flink applications more effectively.

Read more here

Elasticsearchโ€‹

Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. Flinkโ€™s Elasticsearch connector enables you to write data to Elasticsearch indices and perform real-time search and analytics operations on the stored data.

Read more here

MySQL & MySQL CDCโ€‹

Apache Flink provides built-in connectors for MySQL to enable both batch processing and real-time change data capture (CDC) from MySQL databases. This allows you to read and write data from and to MySQL databases, and capture changes in real time as they occur.

The MySQL connector allows you to read and write data from and to MySQL databases using Flinkโ€™s JDBC connector.

Read more here

PostreSQLโ€‹

While Apache Flink does not provide a dedicated built-in connector for PostgreSQL, you can still integrate Flink with PostgreSQL using the JDBC connector or the Change Data Capture (CDC) approach.

Read more here

Redisโ€‹

Redis is an open-source, in-memory data structure store that can be used as a database, cache, and message broker. Flinkโ€™s Redis connector provides seamless integration with Redis, enabling you to read and write data from/to Redis data structures.

Read more here

note

Most of documentation about built-in connectors comes from the official Apache Flinkยฎ documentation.

Refer to the Credits page for more information.