Data Ingestion
Engine support: VERA 4.1 (Flink 1.20).
Why YAML for CDC?
- Simplified job management – Declarative, human‑readable configuration.
- Reusability & consistency – Template values, reuse across envs.
- CI/CD‑friendly – Store in Git, review, promote, rollback.
- Environment separation – Swap credentials/topics/URIs per env.
- Faster onboarding – No Flink internals required.
- Tooling compatibility – Validate/lint/test YAML.
- Separation of concerns – Data flow vs. runtime/platform config.
Quick start (UI)
- Go to Data Ingestion → New Draft and select Blank Draft.
- Name the draft and pick your Engine Version (match to what the job was tested with).
- Paste a YAML CDC config in the Preview panel and click OK.
You can also create drafts from the SQL Editor or import files from your repo.
YAML schema overview
At minimum, a CDC YAML config contains source and sink sections. A job can include one source and one sink per file; compose multiple files for multiple flows.
source:
type: <connector> # e.g., mysql, postgres, oracle, sqlserver, kafka
name: <human name>
hostname: <host or service>
port: <int>
username: ${secret_values.mysqlusername}
password: ${secret_values.mysqlpassword}
database: <db-name> # optional depending on connector
tables: <regex or list> # e.g., "mysql\.\.*" or [ db.schema.table1, db.schema.table2 ]
server-id: <range> # connector-specific; example for MySQL
snapshot.mode: initial # connector-specific snapshot policy
sink:
type: <connector> # e.g., mysql, postgres, kafka, iceberg, hudi, jdbc
name: <human name>
hostname: <host-or-broker>
port: <int>
username: <user>
password: <pass>
database: <db>
table: <table or pattern>
upsert-key: <col or list> # for upsert sinks
Secrets & variables. Use ${…} expressions (e.g., ${secret_values.mysqlpassword}) to reference values injected at deploy time. Treat credentials as secrets—do not hardcode.
Minimal example – MySQL → MySQL
source:
type: mysql
name: Database A to Data warehouse
hostname: mysql-src
port: 3306
username: ${secret_values.mysqlusername}
password: ${secret_values.mysqlpassword}
tables: mysql\.\.*
sink:
type: mysql
name: Database B to Data warehouse
hostname: mysql-dst
port: 3306
username: root
password: pass
Common fields and patterns
Table selection
-
Single table:
tables: mydb.public.users -
Multiple tables:
tables:
- mydb.public.users
- mydb.public.orders -
Regex pattern:
tables: mydb\.public\..*(Escape dots in YAML strings.)
Primary/Upsert keys
For sinks that support upserts, specify a key:
sink:
type: mysql
table: dw.users
upsert-key: id
Parallelism & checkpoints (runtime)
Runtime parameters are set on the deployment and can be overridden per job if supported:
runtime:
parallelism: 4
checkpoint-interval: 60s
restart-strategy: fixed-delay
Error handling
on-error:
drop: false # default; fail the job on deserialization errors
dead-letter: # optional DLQ
type: kafka
topic: cdc-dlq
Exact runtime/error keys vary by connector; prefer the connector’s reference.