On Demand workers

Terminology

In this page are mentioned two concepts of a ‘job’ which are totally different:

  • A Kubernetes Job, referred as ‘Kubernetes job’ in this page.
  • A Optimization Server job, referred as ‘job’ in this page.

A ‘Kubernetes job’ is a deployment descriptor whose execution is meant to terminate.

A ‘job’ is an execution of a ‘task’, see Concept section for the details.

Why ‘On Demand’ workers ?

Until Optimization Server version 3.2.1, workers never stop and wait for jobs forever; it leads to an important resource reservation even when no job is running.

In this page, they will be referred as ‘Resident’ workers.

Starting from 3.2.1, Optimization Server makes it able to start a worker at the time a job is posted.

These workers, running during the job execution only, will be called ‘On Demand’ workers.

Prerequisites

To enable the ‘on demand’ worker feature, Optimization Server relies on Kubernetes jobs.

The direct consequence is that the ‘On Demand’ worker feature is available only if Optimization Server is deployed over Kubernetes or OKD.

Impact on development

There is no impact on development; the workers developed with previous versions of Optimization Server can become ‘On Demand’ with no code change. The worker just needs to include the library as a java dependency with the minimal version 3.2.1:

dependencies {
    implementation ("com.decisionbrain:optimserver-worker:3.5.0")
}

Or use the worker shell with the same version if you don’t use Java as the programming language. The library provides the implementation of the ‘On Demand’ features transparently.

Impact on deployment

The only difference between a ‘Resident’ worker, and an ‘On Demand’ worker resides on the deployment descriptor. The technical details are explained in the below sections.

Deployment of a ‘Resident’ worker

When a worker is deployed the ‘Resident’ way over a Kubernetes cluster, it is through a Deployment. Here is a simplified Deployment of the Cplex worker:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: dbos-worker
  name: dbos-cplex-cpo-worker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dbos-worker
      name: dbos-cplex-cpo-worker
  template:
    metadata:
      labels:
        app: dbos-worker
        name: dbos-cplex-cpo-worker
    spec:
      containers:
        - image: "cplex-registry.decisionbrain.cloud/dbos/dbos-cplex-cpo-worker:3.5.0"
          imagePullPolicy: Always
          name: dbos-cplex-cpo-worker
          env:
            - name: JAVA_TOOL_OPTIONS
              value: -XX:MaxRAMPercentage=80.0 -XX:+CrashOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
                -XX:HeapDumpPath=/tmp/cplex-cpo-worker-heap-dump.hprof
            ...
          resources:
            limits:
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 100Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      ...

Kubernetes starts a pod from this ‘Deployment’ that listens to incoming requests forever.

Deployment of an ‘on demand’ worker

An ‘On Demand’ worker is deployed using a ConfigMap. This ConfigMap defines:

  • The workers
  • The execution contexts

Optimization Server comes with a Helm chart that helps to generate this ConfigMap

The workers

The list of on demand workers are declared in the workers entry of the ConfigMap. A worker is identified by:

  • its docker image.
  • a default execution context.
  • the tasks of the worker with their execution contexts

Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: "dbos-on-demand-configmap"
data:
  workers: |
    - image: "cplex-registry.decisionbrain.cloud/dbos/dbos-cplex-cpo-worker:3.5.0"
      defaultExecutionContext: small
      tasks:
      - id: CplexTask
        defaultExecutionContext: small
        executionContexts:
        - small
        - medium
      - id: CpoTask
        defaultExecutionContext: small
        executionContexts:
        - small    

  medium: |
  ...
  small: |
  ...  

Here the tasks ‘CplexTask’ and ‘CpoTask’, available in the worker ‘cplex-registry.decisionbrain.cloud/dbos/dbos-cplex-cpo-worker:3.5.0’ can be executed with the execution contexts (‘small’, ‘medium’) and (‘small’) respectively. The task ids must match the tasks declared in the worker.yml file. For instance in the Bootstrap section, the declared task id is ‘kcoloring’.

The ‘Cplex’ task is declared to be runnable with two execution contexts ‘small’ and ‘medium’.

The execution contexts

As mentioned above, the ‘on demand’ worker concept relies on Kubernetes jobs. An execution context is a Kubernetes Job template that Optimization Server uses to start the worker when a job of the corresponding task is posted.

These two execution contexts are defined as below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: "dbos-on-demand-configmap"
data:
  workers: |
        ...
  medium: |
    apiVersion: batch/v1
    kind: Job
    metadata:
      labels:
        project: "cpo-wod"
        businessUnit: "IT"
    spec:
      backoffLimit: 0
      activeDeadlineSeconds: 3600
      template:
        metadata:
          name: dbos-cplex-cpo-wod
          labels:
            project: "cpo-wod"
            businessUnit: "IT"
        spec:
          restartPolicy: Never
          containers:
            - imagePullPolicy: IfNotPresent
              name: dbos-cplex-cpo-wod
              env:
              - name: JAVA_TOOL_OPTIONS
                value: -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4
                  -XX:AdaptiveSizePolicyWeight=90
              ...
              resources:
                limits:
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 512Mi    
  small: |
    apiVersion: batch/v1
    kind: Job
    metadata:
      labels:
        project: "cpo-wod"
        businessUnit: "IT"
    spec:
      backoffLimit: 0
      activeDeadlineSeconds: 3600
      template:
        metadata:
          name: dbos-cplex-cpo-wod
          labels:
            project: "cpo-wod"
            businessUnit: "IT"
        spec:
          restartPolicy: Never
          containers:
            - imagePullPolicy: IfNotPresent
              name: dbos-cplex-cpo-wod
              env:
              - name: JAVA_TOOL_OPTIONS
                value: -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4
                  -XX:AdaptiveSizePolicyWeight=90
              ...
              resources:
                limits:
                  memory: 256Mi
                requests:
                  cpu: 100m
                  memory: 256Mi    

Note that the fields spec.template.spec.containers.image and metadata.name are NOT set because they will be injected by the Optimization Server master.

When a job is created with the console, you can choose the execution context:

And so can you when creating a job using the API, see JobCreation and the Java client

Optimization Server master configuration

The Optimization Server master application configuration needs to be adjusted to enable the ‘On Demand’ feature.

All the parameters are listed here.

The parameters relative to the ‘On Demand’ feature' are these:

Parameter name Description Default
OPTIMSERVER_SERVICE_ON-DEMAND-MODE Activate the ‘on demand’ mode NONE
OPTIMSERVER_SERVICE_ON-DEMAND-CONFIG-NAME The name of the ConfigMap to load dbos-on-demand-configmap
OPTIMSERVER_SERVICE_LOAD-ON-DEMAND-CONFIG-ON-STARTUP Tell whether the master should load the ConfigMap when it starts false

To enable the ‘On Demand’ feature, the OPTIMSERVER_SERVICE_ON-DEMAND-MODE parameter must be set to ‘KUBERNETES’

If OPTIMSERVER_SERVICE_LOAD-ON-DEMAND-CONFIG-ON-STARTUP is set to true, all the ‘On Demand’ workers are cleaned and reloaded from the ConfigMap defined in OPTIMSERVER_SERVICE_ON-DEMAND-CONFIG-NAME.

If the ‘On Demand’ workers need to be declared/updated without a master restart, Optimization Server comes with a Helm chart that forces the master to refresh the workers from the ConfigMap. Read this section for more details.