Reference Guide: Built-In Connectors
Apache Flink provides a variety of built-in connectors to facilitate the integration of Flink with different data sources and sinks (also called destinations). These connectors make it easy to read and write data from/to various systems in a scalable and fault-tolerant manner. In this section, we will introduce some of the most commonly used built-in connectors in Apache Flink.
Using the Console Network Detection feature, an IP address or a domain name can be used to check whether the running environment of a fully managed Flink deployment is connected to the upstream and downstream systems. See the FAQ section for more information.
Apache Paimon
At its core, Apache Paimon is a dynamic data lake storage, streamlined for both streaming and batch data processing. With a knack for supporting high-throughput data writing and offering low-latency data querying, it is tailored for compatibility with Flink-based Ververica Cloud. If you're aiming to set up your data lake storage on Hadoop Distributed File System (HDFS) or Ververica Cloud, Apache Paimon is your go-to solution.
Apache Kafka
Apache Kafka is a distributed streaming platform designed for high-throughput, fault-tolerant, and scalable data streaming. Flink’s Kafka connector allows you to consume and produce data from and to Kafka topics.
Upsert Kafka SQL Connector
The Upsert Kafka SQL Connector allows Apache Flink to integrate with Apache Kafka for reading and writing data using upsert semantics. This is particularly useful when working with changelog streams or streaming upserts, where each record represents an update or deletion of a previous record based on a primary key.
Amazon Kinesis Data Streams
Amazon Kinesis Data Streams is a managed, real-time data streaming service provided by Amazon Web Services (AWS). Flink’s Kinesis connector enables you to consume and produce data from and to Kinesis data streams.
DataGen
The DataGen connector in Apache Flink allows you to create tables with in-memory data generation, which is particularly useful for developing and testing queries locally without the need to access external systems such as Kafka. DataGen tables can include computed column syntax for flexible record generation.
Faker
The Faker connector leverages the popular Java Faker library to generate random data based on predefined patterns. This allows you to create tables with data that closely resembles real-world data, enabling you to develop and test your Flink applications more effectively.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. Flink’s Elasticsearch connector enables you to write data to Elasticsearch indices and perform real-time search and analytics operations on the stored data.
MySQL & MySQL CDC
Apache Flink provides built-in connectors for MySQL to enable both batch processing and real-time change data capture (CDC) from MySQL databases. This allows you to read and write data from and to MySQL databases, and capture changes in real time as they occur.
The MySQL connector allows you to read and write data from and to MySQL databases using Flink’s JDBC connector.
PostgreSQL
While Apache Flink does not provide a dedicated built-in connector for PostgreSQL, you can still integrate Flink with PostgreSQL using the JDBC connector or the Change Data Capture (CDC) approach.
Redis
Redis is an open-source, in-memory data structure store that can be used as a database, cache, and message broker. Flink’s Redis connector provides seamless integration with Redis, enabling you to read and write data from/to Redis data structures.
Snowflake
The Snowflake cloud-based data warehousing service designed to manage, share, and analyze large volumes of data with ease and efficiency. This sink-only connector is specifically engineered for scenarios where data is processed in Apache Flink before being transferred and persisted in Snowflake.
StarRocks
StarRocks is a next-gen, high-performance analytical data warehouse that enables real-time, multi-dimensional, and highly concurrent data analysis.
Most of documentation about built-in connectors comes from the official Apache Flink® documentation.
Refer to the Credits page for more information.