Skip to main content

Batch Mode

Ververica Platform is a unified processing system, designed to run both continuous streaming jobs and finite batch jobs. This topic explains the concepts behind batch processing and how to create and run a batch job.

Streaming Mode vs. Batch Mode

In Ververica, you can run jobs in two primary execution modes:

  • Streaming Mode (Default): This is for unbounded, 24/7 jobs that never stop. It's like a river, continuously processing new data as it arrives.
  • Batch Mode: This is for bounded, finite jobs that have a clear beginning and an end. The job processes a specific set of data (like "all sales from yesterday") and then automatically stops with a FINISHED status when complete.

Why Use Batch Mode?

The primary benefit of batch mode is resource isolation. You can use Resource Queues to keep your batch jobs from interfering with your critical streaming jobs.

For example, you can create:

  • A Streaming Queue with guaranteed CUs for your 24/7 streaming applications.
  • A Batch Queue with its own separate CUs for your periodic reports or data transformations.

When you run a large batch job in the Batch Queue, it can only use the resources in that queue and cannot steal resources from your streaming applications.

Batch Job Workflow

Running a batch job is a two-stage process. First, you create and deploy a Batch Draft from the SQL Editor. This creates a runnable deployment. Second, you start that deployment from the Deployments page.

Step 1: Create and Deploy a Batch Draft

  1. Go to the SQL Editor.

  2. On the Drafts tab, click New > New Blank Batch Draft.

  3. Write your SQL query in the code editor workspace. For a simple test, you can use the following query:

    -- 1. Create a bounded (batch) source table
    CREATE TEMPORARY TABLE MyBatchSource (
    id INT,
    some_data STRING
    ) WITH (
    'connector' = 'datagen',
    'number-of-rows' = '10' -- This makes the source finite
    );

    -- 2. Create a sink table to print results
    CREATE TEMPORARY TABLE MyBatchSink (
    id INT,
    some_data STRING
    ) WITH (
    'connector' = 'print'
    );

    -- 3. Run the batch job
    INSERT INTO MyBatchSink
    SELECT * FROM MyBatchSource;
  4. Click Deploy in the toolbar.

  5. Enter any comments for this version, and click Confirm.

After confirming, your SQL deployment appears on the Deployments page, ready to be started.

note

Using SET 'execution-runtime-mode' = 'BATCH' in a streaming draft is not the same as creating a Batch Draft. Only jobs created as New Blank Batch Draft are recognized by the platform as true batch deployments.

Step 2: Start the Batch Job

  1. Go to the Deployments page.
  2. If needed, select BATCH from the deployment type filter dropdown to see your job.
  3. Find your new deployment in the list, and click Start in the Actions column.
  4. In the Start Job dialog, click Start.

The job will run, and its status will change from Starting to Finished after it has processed all the data.

Viewing Batch Job Results

For batch jobs that use a print sink, you can find the results in the Task Manager logs.

  1. On the Deployments page, find your Finished batch deployment.
  2. Click the name of the Job ID to open its details page.
  3. Click the Diagnostics > Task Managers tab, and select the job from the Current TaskManager dropdown. A list of logs for that Task Manager appears.