Skip to main content

Faker

Background information

Simulation data generation Faker is a built-in Connector in the system, which generates test data according to the Java Faker expression provided by each field in the table. When you need to use some test data to verify business logic during development or testing, it is recommended that you use simulated data to generate Connectors.

The information supported by the simulated data generation connector is as follows:

CategoryDescription
Supported modessource table/dimension table
Operating modebatch mode, stream mode
Data Format
Monitoring indicators
Types of APIsSQL

Usage restrictions

Only some data types are supported, including:

  • CHAR(n)
  • VARCHAR(n)
  • STRING
  • TINYINT
  • SMALLINT
  • INT
  • BIGINT
  • FLOAT
  • DOUBLE
  • DECIMAL
  • BOOLEAN
  • TIMESTAMP
  • ARRAY
  • MAP
  • MULTISET
  • ROW

Grammatical structures

    CREATE TABLE faker_source (
`name` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.age.expression' = ' #{number.numberBetween ''0'',''1000''}'
);

WITH parameter

Universal type

ParameterDescriptionType of dataRequiredDefaultsAdditional info
connectorsource table typeStringyesnoneThe fixed value is faker
fields..expressionA Java Faker expression that generates the value of this field.Stringyesnone
fields..null-rateThe field value is the proportion of empty.Floatno0Additional info
fields..lengthThe size of an ARRAY, MAP, or MULTISET collection type.Integerno1none

Source exclusive type

ParameterDescriptionType of dataRequiredDefaultsAdditional info
number-of-rowsThe number of rows of data generatedIntegerno-1If this parameter is set, the source table is bounded, otherwise unbounded.
rows-per-secondThe rate at which data is generatedIntegerno10000The default value is 10000 records/second.

Field expression

Operation method When using simulated data to generate a Connector, each field defined in the DDL needs to provide a specific expression in the WITH statement. The fixed format of the expression is fields..expression = #{className.methodName ‘‘parameter’’, …}. The relevant parameters are described in the table below.

ParameterDescription
fieldIndicates the specific field name in DDL.
classNameIndicates the class name of the Faker class. Java Faker provides about 80 Faker classes to generate the field expressions you need, and you can choose the corresponding class according to your needs. Indicates that the class name of the Faker class is not case-sensitive.
methodNameIndicates the method name. Note that method names are not case sensitive.
parameterIndicates the input parameters of the method. Note: (1) The input parameters of the method need to be enclosed in two half-width single quotation marks (’). (2) Multiple parameters are separated by commas (,)

Example:

This article uses the age field expression fields.age.expression = #{number.numberBetween ‘‘0’’,‘‘1000’’} in the Java Faker API documentation and grammar structure as an example to introduce how to Properly generate SQL expressions for fields in DDL.

In the Java Faker API documentation , find the Number class.

Java Faker

Find the numberBetween method in the Number class and look at its method description.

Java Faker

The numberBetween method means to return the value of the specified number range.

fields.age.expression = #{number.numberBetween ‘‘0’’,‘‘1000’’} for the age field according to the parameters 0 and 1000 passed in to the method by the class name Number and the method name numberBetween ’ . Indicates that the value of the generated age field is in the range of 0 to 1000.

Example of use

Source example

    CREATE TEMPORARY TABLE heros_source (
`name` STRING,
`power` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.power.expression' = '#{superhero.power}',
'fields.power.null-rate' = '0.05',
'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);

CREATE TEMPORARY table blackhole_sink(
`name` STRING,
`power` STRING,
`age` INT
) WITH (
'connector' = 'blackhole'
);

INSERT INTO blackhole_sink SELECT * FROM heros_source;

Dimension example

    CREATE TEMPORARY TABLE datagen_source (
`character_id` INT,
`location` STRING,
`proctime` AS PROCTIME()
) WITH (
'connector' = 'datagen'
);

CREATE TEMPORARY TABLE faker_dim (
`character_id` INT,
`name` STRING
) WITH (
'connector' = 'faker',
'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
'fields.name.expression' = '#{harry_potter.characters}'
);

SELECT
c.character_id,
l.location,
c.name
FROM datagen_source AS l
JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c
ON l.character_id = c.character_id;

INSERT INTO blackhole_sink SELECT * FROM heros_source;