Jobs

A Job resource represents a single Apache Flink® job. The Job resource is derived from a Deployment Template whenever Ververica Platform detects that a new Flink job is needed.

The Job resource is exclusively managed by Ververica Platform and it is not possible to create it manually. Once the Job resource is created, its spec section is effectively immutable and represents a point in time of the spec section of the owning Deployment resource.

The status.state field of the Job resource has the following states:

  • STARTING: The Job is in the process of starting. This includes requesting resources from the Deployment Target and submitting the specified job to it.
  • STARTED: The Job was successfully started.
  • TERMINATING: The Job is in the process of terminating.
  • TERMINATED: The Job has terminated.
  • FAILED: The Deployment Target or Flink have reported an unrecoverable failure. Ververica Platform may report additional failure details.
  • FINISHED: The Job has terminated and finished successfully, e.g. a finite streaming or batch job.

Below state machine shows the possible transitions of a Job’s status:

../../../_images/job-transitions.png

Ververica Platform keeps all jobs in a terminal state. This allows users to trace back the evolution of the Deployment specification, such as parallelism and other configuration parameters.

Note

The Job resource status.state is not the same as the Flink job status since the Job resource state includes phases of acquiring resources from the Deployment Target.

Status

STARTED

When a Job reaches status STARTED, additional information about the Job is provided under status.started. This section of the Job API is considered experimental and no guarnatees are made about future availability.

kind: Job
status:
  state: started
  started:
    # The Flink Job ID. Currently always equal to Job.metadata.id without
    # dashes, e.g. 3f1da3c8-6f22-430f-9200-eda014ea5319 becomes
    # 3f1da3c86f22430f9200eda014ea5319.
    flinkJobId: string
    # Time that the Job transitioned to the STARTED state.
    transitionTime: string
    # Time of last update to {status.started}.
    lastUpdateTime: string
    # The last observed number of Flink job restarts, including process
    # restarts. The number is a lower bound on the actual number of Flink
    # job restarts. On the happy path, it should be close to the actual
    # number. During failures, there are no guarantees except that it is
    # lower than or equal to the actual number of restarts.
    observedFlinkJobRestarts: number
    # The Flink JobStatus observed in the last probe. Either a Flink Job
    # status string (CREATED, RUNNING, FAILING, FAILED, CANCELLING,
    # CANCELED, FINISHED, RESTARTING, SUSPENDED, RECONCILING) or UNKNOWN
    # if the Flink API was not reachable.
    observedFlinkJobStatus: string