MySQL & MySQL CDC
On this page
This topic explains how to use the MySQL connector.
Background
MySQL connector support all databases that are compatible with the MySQL protocol. The databases include Amazon RDS MySQL and self-managed MySQL databases.
The information supported by the MySQL connector is as follows.
Prerequisites
Source Table
A network connection is established between your MySQL database and Ververica Platform: Self-Managed.
- The MySQL server meets the following requirements:
- The MySQL version is 5.6, 5.7 or 8.0.X.
- Binary log is enabled. (Head to the Enabling Binary Log (#enable-binlog-format) section)
- The binlog_row_image parameter is set to FULL. (Setting binlog_row_image)
- The interactive_timeout and wait_timeout parameters are configured in the MySQL configuration file. (Configuring interactive_timeout and wait_timeout (https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_wait_timeout))
- A MySQL user is created and granted the SELECT, SHOW DATABASES, REPLICATION SLAVE, and REPLICATION CLIENT permissions. For example:
CREATE USER 'user'@'%' IDENTIFIED BY 'pwd';GRANT SELECT, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user'@'%' WITH GRANT OPTION;FLUSH PRIVILEGES;
Dimension Table and Result Table
- A MySQL database and a MySQL table are created.
- An IP address whitelist is configured.
Limits
Transport Layer Security (TLS)
The RDS in AWS does not support TLS. However, MySQL connector support TLS by default (because MySQL enables it by default). To solve the mis-match please follow the instructions below.
- Reading data from a MySQL table When you want to read data from a MySQL table in the default catalog, it's necessary to disable SSL for the JDBC connection. To do this, add the following property to your
jdbc.properties. This setting ensures that the connection between your application and the MySQL database does not use SSL encryption.
1jdbc.properties.useSSL=false- Writing data to a MySQL table When writing data to a MySQL table in the default catalog, it's necessary to disable TLS. To do this, you can override the connection URL by adding the useSSL=false parameter. However, you still need to provide the hostname and database name for validation purposes. The connection URL can also be used to specify any other JDBC connection options supported by the JDBC driver. Here's an example of how to construct a connection URL with TLS disabled and the required hostname and database name: Replace
your_hostnamewith the actual hostname of your MySQL server andyour_database_namewith the name of the database you want to connect to.
1jdbc:mysql://your_hostname:3306/your_database_name?useSSL=falseChange Data Capture (CDC) Source Table
MySQL CDC source tables do not support watermarks. If you need to perform window aggregation on a MySQL CDC source table, you can convert time fields into window values and use GROUP BY to aggregate the window values. For example, if you want to calculate the number of orders and sales per minute in a store, you can use the following code:
1SELECT shop_id, DATE_FORMAT(order_ts, 'yyyy-MM-dd HH:mm'), COUNT(*), SUM(price)
2FROM order_mysql_cdc
3GROUP BY shop_id, DATE_FORMAT(order_ts, 'yyyy-MM-dd HH:mm')Only MySQL users who are granted specific permissions can read full data and incremental data from a MySQL CDC source table. The permissions include SELECT, SHOW DATABASES, REPLICATION SLAVE, and REPLICATION CLIENT.
Quick Start
- MySQL tables could be used as a source table as follows:
1CREATE TEMPORARY TABLE mysqlcdc_source (
2 order_id INT,
3 order_date TIMESTAMP(0),
4 customer_name STRING,
5 price DECIMAL(10, 5),
6 product_id INT,
7 order_status BOOLEAN,
8 PRIMARY KEY(order_id) NOT ENFORCED
9) WITH (
10 'connector' = 'mysql',
11 'hostname' = '<yourHostname>',
12 'port' = '3306',
13 'username' = '<yourUsername>',
14 'password' = '<yourPassword>',
15 'database-name' = '<yourDatabaseName>',
16 'table-name' = '<yourTableName>'
17);
18
19CREATE TEMPORARY TABLE blackhole_sink(
20 order_id INT,
21 customer_name STRING
22) WITH (
23 'connector' = 'blackhole'
24);
25
26INSERT INTO blackhole_sink
27SELECT order_id, customer_name FROM mysqlcdc_source;- MySQL source tables will use MySQL CDC source. Please confirm that the prerequisites have been met.
- Each MySQL CDC source table needs to be explicitly configured with a different Server ID. See more details in Precautions.
- MySQL tables could be used as a dimension table in the lookup join as follows:
1CREATE TEMPORARY TABLE datagen_source(
2 a INT,
3 b BIGINT,
4 c STRING,
5 `proctime` AS PROCTIME()
6) WITH (
7 'connector' = 'datagen'
8);
9
10CREATE TEMPORARY TABLE mysql_dim (
11 a INT,
12 b VARCHAR,
13 c VARCHAR
14) WITH (
15 'connector' = 'mysql',
16 'hostname' = '<yourHostname>',
17 'port' = '3306',
18 'username' = '<yourUsername>',
19 'password' = '<yourPassword>',
20 'database-name' = '<yourDatabaseName>',
21 'table-name' = '<yourTableName>'
22);
23
24CREATE TEMPORARY TABLE blackhole_sink(
25 a INT,
26 b STRING
27) WITH (
28 'connector' = 'blackhole'
29);
30
31 INSERT INTO blackhole_sink
32 SELECT T.a, H.b
33 FROM datagen_source AS T JOIN mysql_dim FOR SYSTEM_TIME AS OF T.`proctime` AS H ON T.a = H.a;- MySQL tables could be used as a result table as follows:
1CREATE TEMPORARY TABLE datagen_source (
2 `name` VARCHAR,
3 `age` INT
4) WITH (
5 'connector' = 'datagen'
6);
7
8CREATE TEMPORARY TABLE mysql_sink (
9 `name` VARCHAR,
10 `age` INT
11) WITH (
12 'connector' = 'mysql',
13 'hostname' = '<yourHostname>',
14 'port' = '3306',
15 'username' = '<yourUsername>',
16 'password' = '<yourPassword>',
17 'database-name' = '<yourDatabaseName>',
18 'table-name' = '<yourTableName>'
19);
20
21INSERT INTO mysql_sink
22SELECT * FROM datagen_source;Precautions
CDC Source Table
Each MySQL CDC source table needs to be explicitly configured with a different Server ID.
- Why does MySQL CDC need Server ID?
- Each client that synchronizes database data will have a unique ID, namely Server ID. MySQL server will maintain the network connection and Binlog offset according to the ID. If a large number of clients with different Server IDs connect to MySQL Server together, the CPU of MySQL Server may increase sharply, affecting the stability of online business. If multiple MySQL CDC source tables share the same Server ID, and the source tables cannot be merged, the Binlog offset will be confused, and more or less data will be read. Errors of Server ID conflicts may also occur, so it is recommended that each MySQL CDC source table be configured with a different Server ID.
- How to config Server ID?
- Server ID can be specified in DDL or configured through dynamic hints. It is recommended to configure the Server ID through dynamic hints.
- Configuration of Server IDs in Different Scenarios
- When the incremental snapshot framework is not enabled or the table’s parallelism is 1, a specific Server ID can be specified.
1SELECT * FROM source_table /*+ OPTIONS('server-id'='123456') */ ;- When the incremental snapshot framework is enabled and the table’s parallelism is greater than 1, you need to specify the range of Server IDs. Please ensure that the number of available Server IDs within the range is not less than the parallelism. Assuming that the parallelism is 3, it can be configured as follows:
1SELECT * FROM source_table /*+ OPTIONS('server-id'='123456-123458') */ ;- When combined with CTAS for data synchronization, if the configuration of the CDC source tables is the same, the source tables will be automatically merged. At this time, the same Server ID can be configured for multiple CDC source tables.
1BEGIN STATEMENT SET;
2
3CREATE TABLE IF NOT EXISTS `database1`.`user`
4AS TABLE `mysql`.`tpcds`.`user`
5/*+ OPTIONS('server-id'='8001-8004') */;
6
7CREATE TABLE IF NOT EXISTS `database2`.`user`
8AS TABLE `mysql`.`tpcds`.`user`
9/*+ OPTIONS('server-id'='8001-8004') */;
10
11END;- When the job contains multiple MySQL CDC source tables, and the CTAS statement is not used to synchronize, the source tables cannot be merged, and a different Server ID needs to be provided for each CDC source table.
1select * from
2 source_table1 /*+ OPTIONS('server-id'='123456-123457') */
3 left join
4 source_table2 /*+ OPTIONS('server-id'='123458-123459') */
5on source_table1.id=source_table2.id;Result Table
- You must declare at least one non-primary key in the DDL statement. Otherwise, an error is returned.
- The parameter
scan.incremental.snapshot.chunk.key-columnis required when using the MySQL connector as a source on a table that doesn’t have a primary key. - NOT ENFORCED indicates that Flink does not perform mandatory verification on the primary key. You must ensure the correctness and integrity of the primary key.
DDL Syntax
1CREATE TABLE mysqlcdc_source (
2 order_id INT,
3 order_date TIMESTAMP(0),
4 customer_name STRING,
5 price DECIMAL(10, 5),
6 product_id INT,
7 order_status BOOLEAN,
8 PRIMARY KEY(order_id) NOT ENFORCED
9) WITH (
10 'connector' = 'mysql',
11 'hostname' = '<yourHostname>',
12 'port' = '3306',
13 'username' = '<yourUsername>',
14 'password' = '<yourPassword>',
15 'database-name' = '<yourDatabaseName>',
16 'table-name' = '<yourTableName>'
17);Enable Binlog Format
To use MySQL catalog, it's necessary to enable binlog in RDS. The underlying connector for this process is mysql-cdc. You will need to create a new parameter group based on the existing AWS one, change the binlog_format from OFF to ROW, and use this new parameter group to set up RDS or reconfigure RDS after its creation.
Applying parameter group changes only works if the MySQL database is based on the Aurora engine. This applies both when creating and modifying a database. The instructions in this section will not work for a regular MySQL engine.
The steps are as follows:
- Display your Amazon AWS RDS console.
- To create a new group, click on Parameters group on the left sidebar menu and then click Create parameter group.
- In the Parameter group family field, select the parameter you want to apply.

Select a specific group family that starts with the aurora- prefix, otherwise, you will not be able to select anything for the Type field. It is only available for aurora- families of database.
- In the Type field, select DB Cluster Parameter Group.
- Name your group and fill in the description field - they are both mandatory.
- Click Create.
- Click on the name of your parameter group.
- Click Edit and type
formatin the search field of the Parameters section. - Change the value of
binlog_formattoROWand click Save changes

If you want to create a database with a newly created parameter group:
- Click Create database.
- Scroll down to the Additional configuration section and extend the collapsible menu.
- In the DB cluster parameter group field, select your new group
If you want to modify a database with a newly created parameter group:
- Click Databases in the left sidebar menu.
- Click on your database >> Modify
- In the Additional configuration section, change the field value to your parameter group in the DB cluster parameter group field.
- Click Continue >> Modify cluster
- For the changes to take effect, you need to reboot all databases from the Actions menu.
Parameters in the WITH Clause
Common Parameters
Unique to Source Table
Unique to Dimension Table
Unique to result table
Data Type Mappings
CDC Source Table
Dimension Table and Result Table
More About MySQL CDC Source
Principles
A MySQL CDC source table is a streaming source table of MySQL databases. A MySQL CDC source table reads full historical data from a database and then reads data from binary log files. This way, data accuracy is ensured. If an error occurs, the exactly-once semantics can be used to ensure data accuracy. You can run multiple deployments at the same time to read full data from a MySQL CDC source table by using the MySQL CDC connector. When the deployments are running, the incremental snapshot algorithm is used to perform lock-free reading and resumable uploads.
The MySQL CDC source table provides the following features.
- Integrates stream processing and batch processing to support full and incremental data reading. This way, you do not need to separately implement stream processing and batch processing.
- Allows you to run multiple deployments at the same time to read full data from a MySQL CDC source table. This way, data can be read in a more efficient manner.
- Seamlessly switches between full and incremental data reading and supports automatic scale-in operations. This reduces computing resources that are consumed.
- Supports resumable uploads during full data reading. This way, data can be uploaded in a more stable manner.
- Reads full data without locks. This way, online business is not affected.
When a Ververica Platform: Self-Managed deployment starts, the MySQL CDC connector scans the full table and splits the table into multiple chunks based on the primary key. Then, the MySQL CDC connector uses the incremental snapshot algorithm to read the data from each chunk. The deployment periodically generates checkpoints to record the chunks whose data is read. If a failover occurs, the MySQL CDC connector needs to only continue reading data from the chunks whose data is not read. After the data of all chunks is read, incremental change records are read from the previously obtained binary log file position. The deployment continues periodically generating checkpoints to record the binary log file position. If a failover occurs, the MySQL CDC connector processes data from the previous binary log file position. This way, the exactly-once semantics is implemented. For more information about the incremental snapshot algorithm, see MySQL CDC Source .
Metadata
In most cases, access to metadata is required when you merge and synchronize tables in a sharded database. If you expect to identify data records by the source database names and table names after tables are merged, you can configure metadata columns in the data merging statement to read the source database name and table name of each data record. This way, you can identify the source of each data record after tables are merged.
The following table describes the metadata that you can access by using metadata columns.
The following sample code shows how to merge multiple orders tables in database shards of a MySQL instance into a MySQL table named mysql_orders and synchronize data from the MySQL table to a print table named print_orders.
1 CREATE TABLE mysql_orders (
2 db_name STRING METADATA FROM 'database_name' VIRTUAL, -- Read the database name.
3 table_name STRING METADATA FROM 'table_name' VIRTUAL, -- Read the table name.
4 operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL, -- Read the change time.
5 order_id INT,
6 order_date TIMESTAMP(0),
7 customer_name STRING,
8 price DECIMAL(10, 5),
9 product_id INT,
10 order_status BOOLEAN,
11 PRIMARY KEY(order_id) NOT ENFORCED
12 ) WITH (
13 'connector' = 'mysql-cdc',
14 'hostname' = 'localhost',
15 'port' = '3306',
16 'username' = 'flinkuser',
17 'password' = 'flinkpw',
18 'database-name' = 'mydb_.*', -- Use a regular expression to match multiple database shards.
19 'table-name' = 'orders_.*' -- Use a regular expression to match multiple tables in the sharded database.
20 );
21
22 INSERT INTO print_orders SELECT * FROM mysql_orders;Support for Regular Expressions
MySQL CDC source table supports the use of regular expressions in database names and table names to match multiple databases and tables. The code example for specifying multiple tables by regular expression is as follows.
1 CREATE TABLE products (
2 db_name STRING METADATA FROM 'database_name' VIRTUAL,
3 table_name STRING METADATA FROM 'table_name' VIRTUAL,
4 operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
5 order_id INT,
6 order_date TIMESTAMP(0),
7 customer_name STRING,
8 price DECIMAL(10, 5),
9 product_id INT,
10 order_status BOOLEAN,
11 PRIMARY KEY(order_id) NOT ENFORCED
12 ) WITH (
13 'connector' = 'mysql-cdc',
14 'hostname' = 'localhost',
15 'port' = '3306',
16 'username' = 'root',
17 'password' = '123456',
18 'database-name' = '(^(test).*|^(tpc).*|txc|.*[p$]|t{2})', -- Match multiple databases
19 'table-name' = '(t[5-8]|tt)' -- Match multiple tables
20 );The regular expression interpretation in the above example:
^(test).*is an example of prefix matching. This expression can match the database name beginning withtest, such astest1ortest2.*[p$]is an example of suffix matching. This expression can match database names ending inp, such ascdcporedcp.txcis a specified match, which can match the database name of the specified nametxc.
When MySQL CDC source table matches the full path table name, it will uniquely determine a table through the database name and table name, that is, use database-name.table-name as the matching table pattern. For example, matching pattern (^(test).|^ (tpc).|txc|.*[p$]|t{2}).(t[ 5-8]|tt) can match the tables txc.tt and test2.test5.
table-name and database-name support specifying multiple tables or databses separated by commas, such as 'table-name' = 'mytable1,mytable2'. However, it conflicts with the comma in the regular expression. If a regular expression containing a comma is used, the regular expression needs to be rewritten into the form of a vertical bar (|). For example, mytable_\d{1, 2) requires to use the equivalent regular expression (mytable_\d{1}|mytable_\d{2}) to avoid the usage of commas.
Parallelism Control
The MySQL CDC connector can run multiple deployments at the same time to read full data. This improves the data loading efficiency. If you use the MySQL CDC connector with the Autopilot feature that is provided by Ververica Platform: Self-Managed, automatic scale-in can be performed during incremental data reading after parallel reading is complete. This saves computing resources.
You can configure the Parallelism parameter in Basic mode in the Resource Configuration panel.
When you configure the Parallelism parameter in Basic mode, make sure that the range specified by server-id in the table is greater than or equal to the value of the Parallelism parameter. For example, if the range of server-id is 5404-5412, eight unique server IDs can be used. Therefore, you can configure a maximum of eight parallel deployments. In addition, the range specified by server-id for the same MySQL instance in different deployments cannot overlap. Each deployment must be explicitly configured with a unique server ID.
Automatic Scale-in by Using Autopilot
When full data is read, a large amount of historical data is accumulated. In most cases, Ververica Platform: Self-Managed reads historical data in parallel to improve reading efficiency. When incremental data is read, only a single deployment is required to read data because the amount of binary log data is small and the global order must be ensured. The numbers of compute units (CUs) that are required during full data reading and incremental data reading are different. You can use the Autopilot feature to balance performance and resource consumption.
Autopilot monitors the traffic for each task that is used by the MySQL CDC source table. If the binary log data is read in only one task and other tasks are idle during incremental data reading, Autopilot automatically reduces the number of CUs and the parallelism. To enable Autopilot, you need only to set the Mode parameter of Autopilot to Active on the Deployments page.
By default, the minimum interval at which the decrease of the parallelism is triggered is set to 24 hours.
Startup Mode
Specify the startup mode of the MySQL CDC source table by the configuration scan.startup.mode. Valid values include:
initial: When the MySQL CDC connector starts for the first time, Ververica Platform: Self-Managed scans all historical data and reads the most recent binary log data. This is the default value.latest-offset: When the MySQL CDC connector starts for the first time, Ververica Platform: Self-Managed reads binary log data from the most recent offset, instead of scanning all historical data. This way, Ververica Platform: Self-Managed reads only the most recent incremental data after the connector starts.earliest-offset: When the MySQL CDC connector starts for the first time, Ververica Platform: Self-Managed reads binary log data from the earliest offset, instead of scanning all historical data.specific-offset: When the MySQL CDC connector starts for the first time, Ververica Platform: Self-Managed reads binary log data from the specific offset, instead of scanning all historical data. Specify to start from a specific Binlog file name and offset byscan.startup.specific-offset.fileandscan.startup.specific-offset.pos. Specify to start from a specific GTID set byscan.startup.specific-offset.gtid-set.timestamp: When the MySQL CDC connector starts for the first time, Ververica Platform: Self-Managed reads binary log data from the specific timestamp, instead of scanning all historical data. The timestamp is specified byscan.startup.timestamp-millisin milliseconds.
Example usage:
1 CREATE TABLE specific_binlog_offset (...)
2 WITH (
3 'connector' = 'mysql-cdc',
4 'scan.startup.mode' = 'specific-offset',
5 'scan.startup.specific-offset.file' = 'mysql-bin.000003',
6 'scan.startup.specific-offset.pos' = '4',
7 ...
8 );
9
10 CREATE TABLE specific_gtid (...)
11 WITH (
12 'connector' = 'mysql-cdc',
13 'scan.startup.mode' = 'specific-offset',
14 'scan.startup.specific-offset.gtid-set' = '24DA167-0C0C-11E8-8442-00059A3C7B00:1-19',
15 ...
16 );
17
18 CREATE TABLE specific_timestamp (...)
19 WITH (
20 'connector' = 'mysql-cdc',
21 'scan.startup.mode' = 'timestamp',
22 'scan.startup.timestamp-millis' = '1667232000000'
23 ...
24 );- MySQL CDC source table will log the current position at the INFO level during checkpoint, and the log prefix is
Binlog offset on checkpoint {checkpoint-id}, which can help you to start the job from a certain checkpoint position. - If the read table has ever had a table schema change, an error may occur when starting from the earliest point (earliest-offset), specific point (specific-offset) or timestamp (timestamp). Because the Debezium reader will internally save the current latest table structure, the earlier data that does not match the schema cannot be parsed correctly.
FAQ
If the Flink job failed to start and the root cause of exception is:
1Caused by: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)try to add 'jdbc.properties.useSSL' = 'false' to connector’s parameters.
For example:
1 CREATE TEMPORARY TABLE MysqlTable (
2 id BIGINT,
3 name VARCHAR(128),
4 PRIMARY KEY (id) NOT ENFORCED
5 ) WITH (
6 'connector' = 'mysql',
7 'hostname' = 'xxx',
8 'port' = 'xxx',
9 'database-name' = 'xxx',
10 'table-name' = 'xxx',
11 'username' = 'xxx',
12 'password' = 'xxx',
13 'jdbc.properties.useSSL' = 'false'
14 );