Skip to main content

Data Ingestion

Engine support: VERA 4.1 (Flink 1.20).

Why YAML for CDC?

  1. Simplified job management – Declarative, human‑readable configuration.
  2. Reusability & consistency – Template values, reuse across envs.
  3. CI/CD‑friendly – Store in Git, review, promote, rollback.
  4. Environment separation – Swap credentials/topics/URIs per env.
  5. Faster onboarding – No Flink internals required.
  6. Tooling compatibility – Validate/lint/test YAML.
  7. Separation of concerns – Data flow vs. runtime/platform config.

Quick start (UI)

  1. Go to Data Ingestion → New Draft and select Blank Draft.
  2. Name the draft and pick your Engine Version (match to what the job was tested with).
  3. Paste a YAML CDC config in the Preview panel and click OK.

You can also create drafts from the SQL Editor or import files from your repo.

YAML schema overview

At minimum, a CDC YAML config contains source and sink sections. A job can include one source and one sink per file; compose multiple files for multiple flows.

source:
type: <connector> # e.g., mysql, postgres, oracle, sqlserver, kafka
name: <human name>
hostname: <host or service>
port: <int>
username: ${secret_values.mysqlusername}
password: ${secret_values.mysqlpassword}
database: <db-name> # optional depending on connector
tables: <regex or list> # e.g., "mysql\.\.*" or [ db.schema.table1, db.schema.table2 ]
server-id: <range> # connector-specific; example for MySQL
snapshot.mode: initial # connector-specific snapshot policy

sink:
type: <connector> # e.g., mysql, postgres, kafka, iceberg, hudi, jdbc
name: <human name>
hostname: <host-or-broker>
port: <int>
username: <user>
password: <pass>
database: <db>
table: <table or pattern>
upsert-key: <col or list> # for upsert sinks

Secrets & variables. Use ${…} expressions (e.g., ${secret_values.mysqlpassword}) to reference values injected at deploy time. Treat credentials as secrets—do not hardcode.

Minimal example – MySQL → MySQL

source:
type: mysql
name: Database A to Data warehouse
hostname: mysql-src
port: 3306
username: ${secret_values.mysqlusername}
password: ${secret_values.mysqlpassword}
tables: mysql\.\.*

sink:
type: mysql
name: Database B to Data warehouse
hostname: mysql-dst
port: 3306
username: root
password: pass

Common fields and patterns

Table selection

  • Single table: tables: mydb.public.users

  • Multiple tables:

    tables:
    - mydb.public.users
    - mydb.public.orders
  • Regex pattern: tables: mydb\.public\..* (Escape dots in YAML strings.)

Primary/Upsert keys

For sinks that support upserts, specify a key:

sink:
type: mysql
table: dw.users
upsert-key: id

Parallelism & checkpoints (runtime)

Runtime parameters are set on the deployment and can be overridden per job if supported:

runtime:
parallelism: 4
checkpoint-interval: 60s
restart-strategy: fixed-delay

Error handling

on-error:
drop: false # default; fail the job on deserialization errors
dead-letter: # optional DLQ
type: kafka
topic: cdc-dlq

Exact runtime/error keys vary by connector; prefer the connector’s reference.