Configuring preemption

Preemption is the process of evicting admitted workloads so that higher-priority or fairly-shared workloads can be admitted. Alauda Build of Kueue supports several preemption strategies that allow administrators to control how workloads compete for resources.

Preemption policies

Alauda Build of Kueue supports the following global preemption policies, which can be configured when deploying the Alauda Build of Kueue cluster plugin:

PolicyDescription
ClassicalStandard preemption behavior based on priority and ClusterQueue-level preemption policies. This is the default.
FairSharingEnables fair sharing preemption strategy, where workloads are preempted based on the relative share of resources consumed by each ClusterQueue in a cohort.

ClusterQueue preemption configuration

You can configure preemption behavior at the ClusterQueue level using the spec.preemption field. This allows fine-grained control over when and how workloads in a specific ClusterQueue can be preempted.

Example ClusterQueue with preemption configuration

apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: team-a-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 16
      - name: "memory"
        nominalQuota: 64Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 4
  preemption:
    reclaimWithinCohort: Any
    borrowWithinCohort:
      policy: LowerPriority
    withinClusterQueue: LowerPriority
  1. preemption: Configures the preemption behavior for this ClusterQueue.
  2. reclaimWithinCohort: Controls whether workloads in this ClusterQueue can preempt workloads in other ClusterQueues of the same cohort to reclaim resources that were borrowed. Possible values: Never, Any, LowerPriority.
  3. borrowWithinCohort: Controls whether workloads in this ClusterQueue can preempt workloads in other ClusterQueues of the same cohort to borrow resources.
  4. borrowWithinCohort.policy: LowerPriority means only lower-priority workloads in other queues can be preempted. Never disables preemption for borrowing.
  5. withinClusterQueue: Controls whether workloads within this ClusterQueue can preempt each other. LowerPriority means higher-priority workloads can preempt lower-priority ones. Never disables intra-queue preemption.

Preemption field reference

withinClusterQueue

Controls whether a pending workload can preempt other workloads in the same ClusterQueue.

ValueDescription
Never(Default) No preemption within the same ClusterQueue.
LowerPriorityA pending workload can preempt active workloads with lower priority in the same ClusterQueue.
LowerOrNewerEqualPriorityA pending workload can preempt active workloads with lower priority, or with equal priority that were admitted more recently.

reclaimWithinCohort

Controls whether a pending workload can preempt workloads in other ClusterQueues of the same cohort to reclaim nominal quota that is being borrowed.

ValueDescription
Never(Default) No preemption to reclaim borrowed resources.
AnyA pending workload can preempt workloads in other ClusterQueues of the cohort, regardless of priority.
LowerPriorityA pending workload can only preempt lower-priority workloads in other ClusterQueues of the cohort.

borrowWithinCohort

Controls whether a pending workload can preempt workloads in other ClusterQueues of the same cohort to borrow unused resources.

ValueDescription
Never(Default) No preemption to borrow resources.
LowerPriorityA pending workload can preempt lower-priority workloads in other ClusterQueues of the cohort to borrow resources.

Non-preemptive queues

Some workload types, such as interactive notebook sessions, are not suspendable. These workloads should only be assigned to a local queue backed by a non-preemptive ClusterQueue.

A non-preemptive ClusterQueue keeps the default preemption settings (all set to Never):

apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: non-preemptive-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 16
      - name: "memory"
        nominalQuota: 64Gi
  preemption:
    withinClusterQueue: Never
    reclaimWithinCohort: Never
    borrowWithinCohort:
      policy: Never
INFO

Note: If you assign a non-suspendable workload (such as a Notebook) to a preemptive queue, the workload might be preempted and fail because it cannot be gracefully suspended and resumed.