VERA Benchmark
Benchmarking VERA
This document presents a benchmarking analysis, comparing the performance of the Ververica Cloud: Managed Service platform against Apache Flink deployed on an Amazon EMR cluster. Additionally, it provides detailed guidelines on configuring the testing environment and executing performance evaluations.
Ververica has selected the Nexmark suite as the benchmarking tool of choice. This well-regarded benchmark, derived from the Google Cloud Dataflow benchmark suite, is specifically tailored for assessing the capabilities of streaming platforms. It encompasses a series of queries that simulate the demands of genuine streaming workloads, offering a robust framework for our comparative study.
Prerequisites
Before you begin, ensure you have the following prerequisites ready and available:
-
Essential Tools:
tar: A utility for archiving files, necessary for unpacking and installing software components.
-
Unix/Linux Proficiency:
- A basic understanding of Unix/Linux shell commands is crucial. This will enable you to navigate the system, manage files, and execute scripts within the Unix/Linux environment.
Nexmark setup
Before you can run benchmarks, you'll need to prepare the Nexmark JAR file for use as a custom connector on both Ververica Cloud: Managed Service and Amazon EMR clusters.
Download the Nexmark JAR
The first step involves obtaining the latest version of Nexmark. As of the last update of this guide, version 0.2.0 was the most recent. You can download it using the curl command:
curl -L https://github.com/nexmark/nexmark/releases/download/v0.2.0/nexmark-flink.tgz --output ~/Downloads/nexmark-flink.tgz
You can also download the latest release version of Nexmark from the Nexmark releases page.
Extract the JAR file
After downloading the tarball, proceed to extract its contents with the following commands:
cd ~/Downloads/
tar -xvzf nexmark-flink.tgz
Verify the JAR file
Ensure that the JAR file has been correctly extracted and is present in the directory:
ls -alh nexmark-flink/lib/nexmark-flink-0.2.0.jar
With the nexmark-flink-0.2.0.jar file prepared, you will use it to create a custom connector that will facilitate the source of streaming queries in your benchmarking activities.
Ververica Cloud: Managed Service setup
Log in or register
- Access the Ververica Cloud: Managed Service portal.
- If you already have an account, proceed to log in.
- If you do not have an account, please register for a new account and then log in.
Create a workspace
- Once logged in, click on the New Workspace button to create a new workspace.
During the free-trial period, you are allowed to create only one workspace which comes with 10 Compute Unit (CU) credits. This allocation is sufficient for running the benchmarks as each Flink job will utilize 9 CUs.
Set Up a Nexmark custom connector
- After your workspace is ready, follow these steps to set up the Nexmark custom connector:
- Click on your workspace.
- In the left-side navigation pane, click on Connectors.
- On the Connectors page, click on Create Connector.
- In the dialog box, click on Click to select and navigate to the Nexmark JAR file at
~/Downloads/nexmark-flink/lib/nexmark-flink-0.2-SNAPSHOT.jarand select it.
- Click Next.
- Name the connector
nexmark. - Click Finish.
You don't need to configure the Properties here; they will be provided when the connector is called from the Flink SQL job.
For more detailed instructions, refer to the Ververica Cloud custom connectors documentation.
Execute a SQL query
To create and run a Flink SQL query using the Nexmark source:
- Navigate to the SQL Editor page via the left-side navigation pane.
- Click on the New button.
- Select Blank Stream Draft, then click Next.
- Name your draft and click Create.

Initial SQL job
- Begin with the following query. Copy and paste it into the editor:
-- nexmark-q0, Pass Through
-- Measures the monitoring overhead including the source generator.
DROP TEMPORARY TABLE IF EXISTS nexmark_table;
CREATE TEMPORARY TABLE nexmark_table (
event_type INT,
person ROW<
id BIGINT,
name VARCHAR,
emailAddress VARCHAR,
creditCard VARCHAR,
city VARCHAR,
state VARCHAR,
dateTime TIMESTAMP(3),
extra VARCHAR>,
auction ROW<
id BIGINT,
itemName VARCHAR,
description VARCHAR,
initialBid BIGINT,
reserve BIGINT,
dateTime TIMESTAMP(3),
expires TIMESTAMP(3),
seller BIGINT,
category BIGINT,
extra VARCHAR>,
bid ROW<
auction BIGINT,
bidder BIGINT,
price BIGINT,
channel VARCHAR,
url VARCHAR,
dateTime TIMESTAMP(3),
extra VARCHAR>,
dateTime AS CASE
WHEN event_type = 0 THEN person.dateTime
WHEN event_type = 1 THEN auction.dateTime
ELSE bid.dateTime
END,
WATERMARK FOR dateTime AS dateTime - INTERVAL '4' SECOND
)
WITH (
'connector' = 'nexmark',
'first-event.rate' = '10000000',
'next-event.rate' = '10000000',
'events.num' = '100000000',
'person.proportion' = '2',
'auction.proportion' = '6',
'bid.proportion' = '92'
);
DROP TEMPORARY VIEW IF EXISTS bid;
CREATE TEMPORARY VIEW bid AS
SELECT
bid.auction,
bid.bidder,
bid.price,
bid.channel,
bid.url,
dateTime,
bid.extra
FROM
`default`.nexmark_table
WHERE
event_type = 2;
DROP TEMPORARY TABLE IF EXISTS q0_sink;
CREATE TEMPORARY TABLE q0_sink (
auction BIGINT,
bidder BIGINT,
price BIGINT,
channel VARCHAR,
url VARCHAR,
dateTime TIMESTAMP(3),
extra VARCHAR
)
WITH ('connector' = 'blackhole');
INSERT INTO q0_sink
SELECT
auction, bidder, price, channel, url, dateTime, extra
FROM
bid;
This initial query is the first in the Nexmark benchmark suite, designed to measure the monitoring overhead of the source generator.
For more queries and their descriptions, explore the Nexmark GitHub repository.
Deployment on Ververica Cloud: Managed Service
To deploy the SQL query:
- Save the query.
- Click on the Deploy button at the top-right corner of the page.
- Confirm deployment in the dialog box.