Skip to main content

DataGen

Background Information

Datagen is a connector mainly used for debugging , which can periodically generate random data of the corresponding type in the Datagen source table. If you need to use some test data to quickly verify business logic during development or testing, you can use the Datagen connector to generate random data. Datagen can use computed column syntax (Computed Column syntax), which makes it flexible to generate data.

The information supported by Datagen Connector is as follows:

CategoryDescription
support typesource table
operating modebatch mode, stream mode
Data Formatnot applicable
Monitoring indicatorsnot yet

Prerequisite

None.

Grammatical Structures

    CREATE TABLE datagen_source (
name VARCHAR,
score BIGINT
) WITH (
'connector' = 'datagen'
);

WITH Parameter

Unique to Source

ParameterDescriptionType of dataRequiredDefaultsAdditional info
connectorsource table typeStringyesnoneThe fixed value is datagen
rows-per-secondThe rate at which random data is generatedLongno10000 (articles/second)
number-of-rowsThe total number of generated dataLongnononeBy default, an unbounded data source table is generated. If the generator type of any field is a sequence generator, when all the sequences of a certain field are generated, the source ends and a bounded table is generated.
fields..kindGenerator type to generate data forStringnorandomParameter optional values: random: random generator sequence: sequence generator
fields..minMinimum value for generating random numbersSame type asnoMinimum value of typeValid for fields whose kind is set to random. Only numeric types are supported
fields..maxGenerate the maximum value of random numbersSame type asnoThe maximum value of the typeSame as fields..min
fields..max-pastThe maximum elapsed time relative to the local machine’s current timestamp when generating random timestampsDurationno0Only supports timestamp type
fields..lengthThe length of the generated random string or the capacity of the generated collectionIntegerno100Support char/varchar/binary/varbinary/string/array/map/multiset types
fields..startstart value for sequence generatorSame type asnononenone
fields..endThe end value of the sequence generatorSame type asnonone

Builder

Currently Datagen can use two generators to generate random data.

  • Random Generator: Generates random values. You can specify maximum and minimum values for randomly generated data.
  • Sequence Generator: Generate ordered values within a certain range, and end when the generated sequence reaches the end value, so using a sequence generator will generate a bounded table. You can specify start and end values for the sequence. The supported generators for each type are as follows:
TypeSupported generatorsAdditional info
BOOLEANrandom
CHARrandom/sequence
VARCHARrandom/sequence
BINARYrandom/sequence
VARBINARYrandom/sequence
STRINGrandom/sequence
DECIMALrandom/sequence
TINYINTrandom/sequence
SMALLINTrandom/sequence
INTrandom/sequence
BIGINTrandom/sequence
FLOATrandom/sequence
DOUBLErandom/sequence
DATErandomAlways use the local machine’s current date
timerandomAlways use the local machine’s current time
TIMESTAMPrandomGenerated within the maximum past time range relative to the local machine’s current timestamp
TIMESTAMP_LTZrandomSame as TIMESTAMP
ROWrandomGenerate random subfields
ARRAYrandomgenerate random elements
MAPrandomgenerate random(key, value)
MULTISETrandomgenerate random elements

Example of Use

Source example

Datagen is often used with the LIKE clause to simulate a table:

    CREATE TABLE Orders (
order_number BIGINT,
price DECIMAL(32,2),
buyer ROW<first_name STRING, last_name STRING>,
order_time TIMESTAMP(3)
) WITH (...)

-- create a bounded mock table
CREATE TEMPORARY TABLE GenOrders
WITH (
'connector' = 'datagen',
'number-of-rows' = '10'
)
LIKE Orders (EXCLUDING ALL)
note

This page is derived from the official Apache Flink® documentation.

Refer to the Credits page for more information.