Faker
Background information
Simulation data generation Faker is a built-in Connector in the system, which generates test data according to the Java Faker expression provided by each field in the table. When you need to use some test data to verify business logic during development or testing, it is recommended that you use simulated data to generate Connectors.
The information supported by the simulated data generation connector is as follows:
Category | Description |
---|---|
Supported modes | source table/dimension table |
Operating mode | batch mode, stream mode |
Data Format | |
Monitoring indicators | |
Types of APIs | SQL |
Usage restrictions
Only some data types are supported, including:
- CHAR(n)
- VARCHAR(n)
- STRING
- TINYINT
- SMALLINT
- INT
- BIGINT
- FLOAT
- DOUBLE
- DECIMAL
- BOOLEAN
- TIMESTAMP
- ARRAY
- MAP
- MULTISET
- ROW
Grammatical structures
CREATE TABLE faker_source (
`name` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.age.expression' = ' #{number.numberBetween ''0'',''1000''}'
);
WITH parameter
Universal type
Parameter | Description | Type of data | Required | Defaults | Additional info |
---|---|---|---|---|---|
connector | source table type | String | yes | none | The fixed value is faker |
fields..expression | A Java Faker expression that generates the value of this field. | String | yes | none | |
fields..null-rate | The field value is the proportion of empty. | Float | no | 0 | Additional info |
fields..length | The size of an ARRAY, MAP, or MULTISET collection type. | Integer | no | 1 | none |
Source exclusive type
Parameter | Description | Type of data | Required | Defaults | Additional info |
---|---|---|---|---|---|
number-of-rows | The number of rows of data generated | Integer | no | -1 | If this parameter is set, the source table is bounded, otherwise unbounded. |
rows-per-second | The rate at which data is generated | Integer | no | 10000 | The default value is 10000 records/second. |
Field expression
Operation method When using simulated data to generate a Connector, each field defined in the DDL needs to provide a specific expression in the WITH statement. The fixed format of the expression is fields..expression
= #{className.methodName ‘‘parameter’’, …}
. The relevant parameters are described in the table below.
Parameter | Description |
---|---|
field | Indicates the specific field name in DDL. |
className | Indicates the class name of the Faker class. Java Faker provides about 80 Faker classes to generate the field expressions you need, and you can choose the corresponding class according to your needs. Indicates that the class name of the Faker class is not case-sensitive. |
methodName | Indicates the method name. Note that method names are not case sensitive. |
parameter | Indicates the input parameters of the method. Note: (1) The input parameters of the method need to be enclosed in two half-width single quotation marks (’). (2) Multiple parameters are separated by commas (,) |
Example:
This article uses the age field expression fields.age.expression
= #{number.numberBetween ‘‘0’’,‘‘1000’’}
in the Java Faker API documentation and grammar structure as an example to introduce how to Properly generate SQL expressions for fields in DDL.
In the Java Faker API documentation , find the Number class.
Find the numberBetween method in the Number class and look at its method description.
The numberBetween method means to return the value of the specified number range.
fields.age.expression
= #{number.numberBetween ‘‘0’’,‘‘1000’’}
for the age field according to the parameters 0 and 1000 passed in to the method by the class name Number and the method name numberBetween ’ . Indicates that the value of the generated age field is in the range of 0 to 1000.
Example of use
Source example
CREATE TEMPORARY TABLE heros_source (
`name` STRING,
`power` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.power.expression' = '#{superhero.power}',
'fields.power.null-rate' = '0.05',
'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);
CREATE TEMPORARY table blackhole_sink(
`name` STRING,
`power` STRING,
`age` INT
) WITH (
'connector' = 'blackhole'
);
INSERT INTO blackhole_sink SELECT * FROM heros_source;
Dimension example
CREATE TEMPORARY TABLE datagen_source (
`character_id` INT,
`location` STRING,
`proctime` AS PROCTIME()
) WITH (
'connector' = 'datagen'
);
CREATE TEMPORARY TABLE faker_dim (
`character_id` INT,
`name` STRING
) WITH (
'connector' = 'faker',
'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
'fields.name.expression' = '#{harry_potter.characters}'
);
SELECT
c.character_id,
l.location,
c.name
FROM datagen_source AS l
JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c
ON l.character_id = c.character_id;
INSERT INTO blackhole_sink SELECT * FROM heros_source;