Data Ingestion
1 min read
On this page
Engine support: VERA 4.1 (Flink 1.20).
Why YAML for CDC?
- Simplified job management – Declarative, human‑readable configuration.
- Reusability & consistency – Template values, reuse across envs.
- CI/CD‑friendly – Store in Git, review, promote, rollback.
- Environment separation – Swap credentials/topics/URIs per env.
- Faster onboarding – No Flink internals required.
- Tooling compatibility – Validate/lint/test YAML.
- Separation of concerns – Data flow vs. runtime/platform config.
Quick start (UI)
- Go to Data Ingestion → New Draft and select Blank Draft.
- Name the draft and pick your Engine Version (match to what the job was tested with).
- Paste a YAML CDC config in the Preview panel and click OK.
You can also create drafts from the SQL Editor or import files from your repo.
YAML schema overview
At minimum, a CDC YAML config contains source and sink sections. A job can include one source and one sink per file; compose multiple files for multiple flows.
YAML
1source:
2 type: <connector> # e.g., mysql, postgres, oracle, sqlserver, kafka
3 name: <human name>
4 hostname: <host or service>
5 port: <int>
6 username: ${secret_values.mysqlusername}
7 password: ${secret_values.mysqlpassword}
8 database: <db-name> # optional depending on connector
9 tables: <regex or list> # e.g., "mysql\.\.*" or [ db.schema.table1, db.schema.table2 ]
10 server-id: <range> # connector-specific; example for MySQL
11 snapshot.mode: initial # connector-specific snapshot policy
12
13sink:
14 type: <connector> # e.g., mysql, postgres, kafka, iceberg, hudi, jdbc
15 name: <human name>
16 hostname: <host-or-broker>
17 port: <int>
18 username: <user>
19 password: <pass>
20 database: <db>
21 table: <table or pattern>
22 upsert-key: <col or list> # for upsert sinksSecrets & variables.
Use ${…} expressions (e.g., ${secret_values.mysqlpassword}) to reference values injected at deploy time. Treat credentials as secrets. Do not hardcode.
Minimal example – MySQL → MySQL
YAML
1source:
2 type: mysql
3 name: Database A to Data warehouse
4 hostname: mysql-src
5 port: 3306
6 username: ${secret_values.mysqlusername}
7 password: ${secret_values.mysqlpassword}
8 tables: mysql\.\.*
9
10sink:
11 type: mysql
12 name: Database B to Data warehouse
13 hostname: mysql-dst
14 port: 3306
15 username: root
16 password: passCommon fields and patterns
Table selection
- Single table:
tables: mydb.public.users - Multiple tables:
YAML
1tables:
2 - mydb.public.users
3 - mydb.public.orders- Regex pattern:
tables: mydb\.public\..*(Escape dots in YAML strings.)
Primary/Upsert keys
For sinks that support upserts, specify a key:
YAML
1sink:
2 type: mysql
3 table: dw.users
4 upsert-key: idParallelism & checkpoints (runtime)
Runtime parameters are set on the deployment and can be overridden per job if supported:
YAML
1runtime:
2 parallelism: 4
3 checkpoint-interval: 60s
4 restart-strategy: fixed-delayError handling
YAML
1on-error:
2 drop: false # default; fail the job on deserialization errors
3 dead-letter: # optional DLQ
4 type: kafka
5 topic: cdc-dlqExact runtime/error keys vary by connector; prefer the connector’s reference.
Was this helpful?