Savepoints

A Savepoint Resource points to a single savepoint in Apache Flink. A single Flink savepoint can be referenced by multiple Ververica Platform Savepoint resources.

Specification

There are different metadata.origin values for Savepoints:

  • USER_REQUEST: The Savepoint has been requested manually by a user through Ververica Platform.
  • SUSPEND: The Savepoint has been requested when the corresponding Deployment was suspended.
  • COPIED: The Savepoint is a copy of another Savepoint resource. Both Savepoint resources point to the same physical Flink savepoint.
  • RETAINED_CHECKPOINT: The Savepoint is a retained Flink checkpoint that was not discarded after the Flink job was shut down.

The Restore Strategy of your Deployment resources controls which Savepoint will be used to restore the state of a Flink job.

Ververica Platform does not keep track of Flink savepoints not created through Ververica Platform.

Attention

In order to use Ververica Platform features that rely on savepoints (such as stateful upgrades or suspending a Deployment), the Deployment must have the Flink configuration parameter state.savepoints.dir set in Deployment.spec.template.spec.flinkConfiguration or use Universal Blob Storage.

Manually Adding a Savepoint Resource

Savepoints triggered by or through Ververica Platform are automatically added to the Deployment. Yet, in some cases you might want to recover or start your Deployment from a specific Apache Flink state snapshot that is not yet tracked by Ververica Platform. In such a scenario you need to manually add a Savepoint resource to your Deployment.

In the following, we assume that you already have a savepoint or (externalized) checkpoint at hand to resume from. The following steps will allow you to resume from your desired snapshot:

POST /api/v1/namespaces/{namespace}/savepoints
metadata:
  deploymentId: ${deploymentId}
  annotations:
    com.dataartisans.appmanager.controller.deployment.spec.version: ${deploymentSpecVersion}
spec:
  savepointLocation:  ${savepointLocation}
  flinkSavepointId: 00000000-0000-0000-0000-000000000000
status:
  state: COMPLETED

This will create a Savepoint resource for the Deployment with ID deploymentId and point it to the snapshot at savepointLocation. You have to extract the deploymentSpecVersion from Deployment.metadata.annotations."com.dataartisans.appmanager.controller.deployment.spec.version" of the corresponding Deployment and assign it to the posted Savepoint. Afterwards the web user interface for this Deployment will show (in the Snapshots Tab) that the Deployment will be started from this Savepoint. Its origin should be “COPIED”.

Note

You have to ensure that the provided savepointLocation is valid and accessible by the Apache Flink pods. If this is not the case, you will notice errors only during runtime of the job(s) that try to restore from this location.

Note

If the com.dataartisans.appmanager.controller.deployment.spec.version annotation is missing, the Savepoint will not be used during restore.