Skip to main content

Data Type Conversion

Starting with VERA Engine 4.3, Ververica supports enhanced data type conversion behavior, including support for the VARIANT data type. This flexibility simplifies schema handling by allowing both implicit and explicit conversions between various data types.

This guide describes how VERA handles data type conversions, including the support matrix, conversion rules, and common SQL patterns.

Overview of the VARIANT Data Type

The VARIANT data type is a flexible container that can hold values of any other supported data type. It is particularly useful in scenarios where the schema is semi-structured or when you need to handle diverse data types in a single column.

Use Cases

  • Flexible Schema Handling: Storing data from sources with evolving schemas.
  • Generic Data Processing: Passing data through intermediate steps without needing to define specific types until the final output.
  • JSON Integration: Handling dynamic fields in JSON payloads.

Data Type Conversion Rules

VERA Engine supports two types of data type conversions: implicit and explicit.

Implicit Conversion

Implicit conversion occurs automatically when the engine can safely convert a value from one type to another without risk of data loss or ambiguity. For example, converting a TINYINT to a BIGINT is handled implicitly.

Explicit Conversion

Explicit conversion requires the use of a conversion function, such as CAST. This is necessary when:

  • There is a potential for precision loss (for example, converting from BIGINT to INT).
  • The conversion is between types that are not compatible for automatic transformation (for example, STRING to TIMESTAMP).

NULL Handling

The NULL value is untyped and can be converted to any nullable data type. Converting a NULL value to any target type always results in NULL.

Precision-Loss Rules

To ensure data integrity, VERA does not allow implicit conversion from a larger numeric type to a smaller numeric type if precision loss is possible. You must use an explicit CAST in these scenarios.

  • Example: Converting BIGINT to INT requires CAST(column AS INT).

Supported SQL Patterns

Use the following SQL patterns to perform data type conversions:

CAST

The CAST function is the primary way to perform explicit conversions.

SELECT CAST(my_column AS DOUBLE) FROM my_table;

TIMESTAMP Conversions

Use TO_TIMESTAMP and UNIX_TIMESTAMP for conversions involving temporal types.

-- Convert string or numeric to TIMESTAMP
SELECT TO_TIMESTAMP(time_string) FROM my_table;

-- Convert integer to TIMESTAMP via UNIX_TIMESTAMP
SELECT TO_TIMESTAMP(FROM_UNIXTIME(unix_seconds)) FROM my_table;

-- Convert TIMESTAMP to numeric epoch
SELECT CAST(UNIX_TIMESTAMP(CAST(ts_column AS STRING)) AS BIGINT) FROM my_table;

VARIANT as Input or Output

You can cast values to and from the VARIANT type.

-- Cast a literal to VARIANT
SELECT CAST(123 AS VARIANT);

-- Cast VARIANT back to a specific type
SELECT CAST(variant_column AS STRING) FROM my_table;

Type Conversion Support Matrix

The following table describes the support for conversion between various data types in VERA.

Legend:

  • I: Implicit conversion supported.
  • E: Explicit conversion supported.
  • -: Conversion not supported.
Input Type \ Target Typenullbooleantinyintsmallintintbigintdecimalfloatdoubleintervaldatetimetimestamp[var]char[var]binaryvariant
nullIIIIIIIIIIIIIIII
boolean-IEEEEEEE----E-E
tinyint-EIIIIIII---EE-E
smallint-EEIIIIII---EE-E
int-EEEIIIII---EE-E
bigint-EEEEIIII---EE-E
decimal-EEEEEIII---EE-E
float-EEEEEIII---EE-E
double-EEEEEIEI---EE-E
interval---------I------
date----------I-IE-E
time-----------IEE-E
timestamp--EEEEEEE-IEIE-E
[var]char-EEEEEEEEEEEEIEE
[var]binary-------------EIE
variant-EEEEEEEEEEEEEEI

Known Limitations and Constraints

  • INTERVAL: Direct conversion to or from the INTERVAL type is not supported.
  • VARIANT Testing: When using the datagen connector for testing, note that it does not natively support the VARIANT type. You can validate VARIANT behavior by using nested CAST operations (for example, CAST(CAST(1 AS VARIANT) AS INT)).