Example Kueue resource configurations for distributed workloads
This page provides example Kueue resource configurations for distributed workloads that use GPU accelerators. These examples demonstrate how to configure ResourceFlavor, ClusterQueue, and LocalQueue objects for different GPU scenarios.
TOC
NVIDIA GPUs without shared cohortResourceFlavor for NVIDIA Tesla T4 GPUResourceFlavor for NVIDIA A30 GPUClusterQueue for Tesla T4 GPU nodesClusterQueue for A30 GPU nodesNVIDIA GPUs with HAMi virtual GPU sharingResourceFlavor for HAMi-managed Tesla T4 GPUResourceFlavor for HAMi-managed A100 GPUClusterQueue with multiple HAMi GPU flavorsMixed physical and virtual GPU managementResourceFlavor for physical GPU nodesResourceFlavor for HAMi virtual GPU nodesClusterQueue with both physical and virtual GPU flavorsRestricting ClusterQueues to specific namespacesNVIDIA GPUs without shared cohort
In this scenario, you have two types of NVIDIA GPU nodes and you want to configure separate queues for each GPU type without sharing resources between them.
ResourceFlavor for NVIDIA Tesla T4 GPU
ResourceFlavor for NVIDIA A30 GPU
ClusterQueue for Tesla T4 GPU nodes
ClusterQueue for A30 GPU nodes
NVIDIA GPUs with HAMi virtual GPU sharing
In this scenario, you use Alauda Build of HAMi to enable GPU sharing and slicing. Different GPU models are configured with HAMi-specific resource names.
ResourceFlavor for HAMi-managed Tesla T4 GPU
ResourceFlavor for HAMi-managed A100 GPU
ClusterQueue with multiple HAMi GPU flavors
t4-hami-flavor: Quotas for Tesla T4 GPU nodes managed by HAMi. Up to 8 virtual GPU allocations with a total of 800 GPU cores and 64Gi GPU memory.a100-hami-flavor: Quotas for A100 GPU nodes managed by HAMi. Up to 4 virtual GPU allocations with larger per-GPU memory (80Gi per GPU).
Mixed physical and virtual GPU management
In this scenario, some GPU nodes are managed by the NVIDIA GPU Device Plugin (physical GPUs) while others are managed by HAMi (virtual GPUs).
ResourceFlavor for physical GPU nodes
ResourceFlavor for HAMi virtual GPU nodes
ClusterQueue with both physical and virtual GPU flavors
Note: Physical GPU resources (nvidia.com/gpu) and virtual GPU resources (nvidia.com/gpualloc, nvidia.com/total-gpucores, nvidia.com/total-gpumem) must be in separate resourceGroups because they are different resource types.
Restricting ClusterQueues to specific namespaces
By default, namespaceSelector: {} allows all namespaces to submit workloads to the ClusterQueue. To restrict access to specific namespaces, use matchLabels:
matchLabels: Restricts this ClusterQueue to only accept workloads from theteam-mlnamespace. You can use any label that exists on the target namespace.