Raw
On this page
This article introduces you how to use the Raw format, configuration options, and type mapping.
Background Information
The Raw format allows reading and writing of byte-based raw values as a single column. Raw format connector is built-in.
Instructions
For example, have the following log data in raw format in Kafka and want to read and analyze such data using Flink SQL.
47.29.201.179 - - [28/Feb/2019:13:17:10 +0000] "GET /?p=1 HTTP/2.0" 200 5316"Mozilla/ 5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36" "2.75"
An example of reading from a Kafka topic as an anonymous string value encoded in UTF-8 using Raw format is as follows:
1 CREATE TABLE nginx_log (
2 log STRING
3 ) WITH (
4 'connector' = 'kafka',
5 'topic' = 'nginx_log',
6 'properties.bootstrap.servers' = 'localhost:9092',
7 'properties.group.id' = 'testGroup' ,
8 'format' = 'raw'
9 );After reading the original data into a plain string through the above statement, you can use a custom function to split the string into multiple strings for further analysis, such as the my_split function in the following SQL statement.
1 SELECT t.hostname, t.datetime, t.url, t.browser, ...
2 FROM(
3 SELECT my_split(log) as t FROM nginx_log
4 );Likewise, a column of type STRING can be written to a Kafka topic as an anonymous string value encoded in UTF-8.
Configuration Options
Type Mapping
The Flink SQL types supported by Raw format are as follows.
Other Instructions for Use
The Raw format encodes the NULL value into a byte[] type NULL, and Upsert-Kafka regards the NULL value as a tombstone message and deletes the value on the key. So it is recommended to avoid Upsert-Kafka connector and Raw format as value.format if the field has NULL value.