Skip to main content

Built-in Connectors

Apache Flink provides a variety of built-in connectors to facilitate the integration of Flink with different data sources and sinks (also called destinations). These connectors make it easy to read and write data from/to various systems in a scalable and fault-tolerant manner. In this section, we will introduce some of the most commonly used built-in connectors in Apache Flink.

note

Using the Console Network Detection feature, an IP address or a domain name can be used to check whether the running environment of a fully managed Flink deployment is connected to the upstream and downstream systems. See the FAQ section for more information.

Apache Paimon

At its core, Apache Paimon is a dynamic data lake storage, streamlined for both streaming and batch data processing. With a knack for supporting high-throughput data writing and offering low-latency data querying, it is tailored for compatibility with Flink-based Ververica Cloud. If you're aiming to set up your data lake storage on Hadoop Distributed File System (HDFS) or Ververica Cloud, Apache Paimon is your go-to solution.

Read more here

Apache Kafka

Apache Kafka is a distributed streaming platform designed for high-throughput, fault-tolerant, and scalable data streaming. Flink’s Kafka connector allows you to consume and produce data from and to Kafka topics.

Read more here

Upsert Kafka SQL Connector

The Upsert Kafka SQL Connector allows Apache Flink to integrate with Apache Kafka for reading and writing data using upsert semantics. This is particularly useful when working with changelog streams or streaming upserts, where each record represents an update or deletion of a previous record based on a primary key.

Read more here

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a managed, real-time data streaming service provided by Amazon Web Services (AWS). Flink’s Kinesis connector enables you to consume and produce data from and to Kinesis data streams.

Read more here

DataGen

The DataGen connector in Apache Flink allows you to create tables with in-memory data generation, which is particularly useful for developing and testing queries locally without the need to access external systems such as Kafka. DataGen tables can include computed column syntax for flexible record generation.

Read more here

Faker

The Faker connector leverages the popular Java Faker library to generate random data based on predefined patterns. This allows you to create tables with data that closely resembles real-world data, enabling you to develop and test your Flink applications more effectively.

Read more here

Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. Flink’s Elasticsearch connector enables you to write data to Elasticsearch indices and perform real-time search and analytics operations on the stored data.

Read more here

MySQL & MySQL CDC

Apache Flink provides built-in connectors for MySQL to enable both batch processing and real-time change data capture (CDC) from MySQL databases. This allows you to read and write data from and to MySQL databases, and capture changes in real time as they occur.

The MySQL connector allows you to read and write data from and to MySQL databases using Flink’s JDBC connector.

Read more here

PostreSQL

While Apache Flink does not provide a dedicated built-in connector for PostgreSQL, you can still integrate Flink with PostgreSQL using the JDBC connector or the Change Data Capture (CDC) approach.

Read more here

Redis

Redis is an open-source, in-memory data structure store that can be used as a database, cache, and message broker. Flink’s Redis connector provides seamless integration with Redis, enabling you to read and write data from/to Redis data structures.

Read more here

Snowflake

The Snowflake cloud-based data warehousing service designed to manage, share, and analyze large volumes of data with ease and efficiency. This sink-only connector is specifically engineered for scenarios where data is processed in Apache Flink before being transferred and persisted in Snowflake.

Read more here

StarRocks

StarRocks is a next-gen, high-performance analytical data warehouse that enables real-time, multi-dimensional, and highly concurrent data analysis.

Read more here

note

Most of documentation about built-in connectors comes from the official Apache Flink® documentation.

Refer to the Credits page for more information.