Skip to main content

Create a Python deployment

Preconditions

Services provided by Ververica Cloud Console are subject to deployment environments and network environments. Therefore, when you develop Python API jobs, take note of the following limits:

  • Python is pre-installed in your Ververica Cloud Console cluster, and common Python libraries such as pandas, NumPy, and PyArrow are pre-installed in the Python environment. Therefore, you must develop code with the latest python version.
  • Java Development Kit (JDK) 1.11 is used in the running environment of Ververica Cloud Console. If your Python API job depends on a third-party JAR package, make sure that the JAR package is compatible with JDK 1.11.
  • Only open source Scala 2.11 is supported. If your Python API job depends on a third-party JAR package, make sure that the JAR package that is compatible with Scala 2.11 is used.

Step 1: Upload the Python package

Before the job runs, you need to follow these steps to upload the Python package, Python job file, or Python dependency to Ververica Cloud Console.

note

The maximum supported file size is 200 MB, but it's recommended to keep files under 100 MB. If your file exceeds this size, consider splitting it into smaller files. You can then select them under 'Additional Dependencies' when creating the deployment.

  1. On the Dashboard page, open the console for the workspace you want to manage.

  2. In the Console navigation pane, click Artifacts.

  3. Click Upload Artifact and select the Python package that you want to upload.

You need to upload the official JAR package of PyFlink.

note

Ververica recommends that you upload Python resources through a separate Python job portal, see Deploy a Python job.

Step 2: Deploy a Python job

  1. On the Deployments page, click Create Deployment to display the Create Deployment dialog.

  2. Enter the information about Python job deployment.

    ParameterDescription
    Deployment typeSelect Python.
    Deployment nameEnter the name of the Python job.
    Engine versionStarting with the latest VERA version.
    Python URIThe Uniform Resource Identifier (URI) to access the Python draft file that you want to upload. Python draft files can be .py files or .zip files. Note: If your job is a Python API type, you need to fill in the official JAR package of PyFlink for the official JAR download address.
    Entry ModuleThe entry point class of the program. If you select a .py Python draft file, you do not need to specify this parameter. If you select a .zip Python draft file, you must specify this parameter. For example, you can enter example.word_count in the Entry Module field.
    Entry Point Main ArgumentsYou can pass in the parameter here and call it inside the main method. (1) The parameter information length should not be greater than 1024, and it is not recommended to pass complex parameters, which refer to parameters that include line breaks, spaces or other special characters. If you need to pass in complex parameters, use additional dependency files to transfer. (2) If your job is of **the Python API type, you need to upload your Python job file first. After the Python job file is uploaded, it will be uploaded to the /flink/usrlib/ directory of the job running node by default. Note: If your Python job file name is word_count.py, the Entrypoint main args needs to be filled in -py /flink/usrlib/word_count.py. The path of the Python job file needs to be filled in as a full path, /flink/usrlib/ cannot be omitted and cannot be changed.
    Python LibrariesA third-party Python package. The third-party Python package that you uploaded is added to PYTHON PATH of the Python worker process so that the package can be directly accessed in Python user-defined functions (UDFs). For more information about how to use third-party Python packages, see Use a third-party Python package.
    Python ArchivesArchive files. Only ZIP files such as .zip, .jar, .whl, and .egg are supported. Archive files are decompressed to the working directory of the Python worker process. For example, if the name of the compressed file where the archive files are located is mydata.zip, the following code can be written in the Python UDFs to access the mydata.zip archive file. def map(): with open("mydata.zip/mydata/data.txt") as f: ...
    Additional Dependencies(1) (Recommended) Select the target you have uploaded to attach dependent files. You must upload the dependent files on the left side of Artifacts page in advance or upload the attachment dependent files on the S3 console. (2) Enter the S3 path of the dependent file attached to the target. You must upload the attached dependent files to the S3 bucket corresponding to the current instance in advance, and the S3 bucket that you selected when you activated Flink Full Hosting must upload the attached dependent file. (3) Fill in the URL of the dependent file attached to the target, and currently only URLs that end with the file name, such as s3://xxxxxx/file. You need to upload the attached dependent files to a publicly accessible HTTP service in advance. Note: Session clusters do not support setting additional dependent files, and only Per-Job clusters support setting additional dependent files. The attachment-dependent files uploaded in the above three methods will eventually be loaded into the /flink/usrlib directory of the pod where JM and TM are located when the job runs. Session mode jobs do not support configuring additional dependent file paths.
    Deploy to the Session clusterIf you select Submit to Session cluster, select the target Session cluster from the drop-down list below. For more information about how to create a Session cluster, see the Session Clusters section.
    DescriptionOptionally, fill in the description information.
  3. Click Deploy

When the deployment is complete, start the job on the Deployments page. For more information about how to start a job, see the Start Deployment section.