DataGen
Background information
Datagen is a connector mainly used for debugging , which can periodically generate random data of the corresponding type in the Datagen source table. If you need to use some test data to quickly verify business logic during development or testing, you can use the Datagen connector to generate random data. Datagen can use computed column syntax (Computed Column syntax), which makes it flexible to generate data.
The information supported by Datagen Connector is as follows:
Category | Description |
---|---|
support type | source table |
operating mode | batch mode, stream mode |
Data Format | not applicable |
Monitoring indicators | not yet |
Types of APIs | SQL |
Prerequisite
None.
Grammatical structures
CREATE TABLE datagen_source (
name VARCHAR,
score BIGINT
) WITH (
'connector' = 'datagen'
);
WITH parameter
Unique to source
Parameter | Description | Type of data | Required | Defaults | Additional info |
---|---|---|---|---|---|
connector | source table type | String | yes | none | The fixed value is datagen |
rows-per-second | The rate at which random data is generated | Long | no | 10000 (articles/second) | |
number-of-rows | The total number of generated data | Long | no | none | By default, an unbounded data source table is generated. If the generator type of any field is a sequence generator, when all the sequences of a certain field are generated, the source ends and a bounded table is generated. |
fields..kind | Generator type to generate data for | String | no | random | Parameter optional values: random: random generator sequence: sequence generator |
fields..min | Minimum value for generating random numbers | Same type as | no | Minimum value of type | Valid for fields whose kind is set to random. Only numeric types are supported |
fields..max | Generate the maximum value of random numbers | Same type as | no | The maximum value of the type | Same as fields..min |
fields..max-past | The maximum elapsed time relative to the local machine’s current timestamp when generating random timestamps | Duration | no | 0 | Only supports timestamp type |
fields..length | The length of the generated random string or the capacity of the generated collection | Integer | no | 100 | Support char/varchar/binary/varbinary/string/array/map/multiset types |
fields..start | start value for sequence generator | Same type as | no | none | none |
fields..end | The end value of the sequence generator | Same type as | no | none |
Builder
Currently Datagen can use two generators to generate random data.
- Random Generator: Generates random values. You can specify maximum and minimum values for randomly generated data.
- Sequence Generator: Generate ordered values within a certain range, and end when the generated sequence reaches the end value, so using a sequence generator will generate a bounded table. You can specify start and end values for the sequence. The supported generators for each type are as follows:
Type | Supported generators | Additional info |
---|---|---|
BOOLEAN | random | |
CHAR | random/sequence | |
VARCHAR | random/sequence | |
BINARY | random/sequence | |
VARBINARY | random/sequence | |
STRING | random/sequence | |
DECIMAL | random/sequence | |
TINYINT | random/sequence | |
SMALLINT | random/sequence | |
INT | random/sequence | |
BIGINT | random/sequence | |
FLOAT | random/sequence | |
DOUBLE | random/sequence | |
DATE | random | Always use the local machine’s current date |
time | random | Always use the local machine’s current time |
TIMESTAMP | random | Generated within the maximum past time range relative to the local machine’s current timestamp |
TIMESTAMP_LTZ | random | Same as TIMESTAMP |
ROW | random | Generate random subfields |
ARRAY | random | generate random elements |
MAP | random | generate random(key, value) |
MULTISET | random | generate random elements |
Example of use
Source example
Datagen is often used with the LIKE clause to simulate a table:
CREATE TABLE Orders (
order_number BIGINT,
price DECIMAL(32,2),
buyer ROW<first_name STRING, last_name STRING>,
order_time TIMESTAMP(3)
) WITH (...)
-- create a bounded mock table
CREATE TEMPORARY TABLE GenOrders
WITH (
'connector' = 'datagen',
'number-of-rows' = '10'
)
LIKE Orders (EXCLUDING ALL)
note
This page is derived from the official Apache Flink® documentation.
Refer to the Credits page for more information.