Ververica Streaming Ledger executes transactions with ACID semantics under the isolation level ‘serializable’. That is the strongest isolation level known in database management systems. The semantics are as if the transactions execute serially: each transaction executes individually, and the next transaction only starts once the previous one is complete, seeing all changes made by the previous transaction.
The challenge is to provide these semantics without actually executing the transactions one after the other, which would not be scalable. Ververica Streaming Ledger achieves this through a combination of logical timestamping and conflict-free scheduling of the individual operations that a transaction is comprised of.
Stream processing is highly asynchronous by nature, and many stream processing systems perform durability asynchronously as well, for performance reasons. Apache Flink, for example, implements asynchronous checkpoints for persistence / recovery.
- Because Ververica Streaming Ledger is a library on Apache Flink, its state updates are durable once a checkpoint completes. Hence the durability semantics rely on the type of sink operator that is used with the streaming ledger application:
- Using an exactly-once sink, hard durability is guaranteed once a result event is read from a result stream, but the sink typically introduces additional latency.
- Using an at-least-once sink, the results of transactions are always based on a consistent view, but a result may be subsumed by another result created during a retry (the duplicate in the at-least-once case). The new result will also be based on a consistent view, but may be different from the first result, because it followed a different serializable schedule.
Serializable, Linearizable, Strictly Serializable¶
The isolation level “serializable” is always guaranteed by the Ververica Streaming Ledger implementation. Under common conditions, users can even assume “strictly serializable” concurrency semantics.
Strictly serializable combines the properties of being “serializable”, with the semantics of “linearizable” updates: Linearizable here means that if a transaction event B is put into the stream after transaction event A’s result is received from the result stream, then transaction B’s modifications on the data happen after transaction A’s modification. Simply speaking, the order of transactions obeys the real-world order of triggering the actions.
Due to the asynchronous nature of stream processing, linearizability can only be relied on if one of the two following conditions is fulfilled:
- Transaction A’s changes are durable before transaction B is put into the source stream. This can easily be achieved by using an exactly once sink for the result streams.
- Transactions A and B will be replayed in the same order during recovery. This can be easily achieved with log-style data sources like Kinesis, or Apache Kafka. When relying on linearizable semantics between two transactions, the events that trigger them would need to be added to the same partition or shard, to maintain order upon replay during failure/recovery scenarios.