Configuring workload management with Kueue

As an administrator, you can configure Alauda Build of Kueue to manage workloads in your cluster. This procedure describes the end-to-end process for setting up Kueue-based workload management.

Prerequisites

  • You have cluster administrator permissions.
  • The Alauda Container Platform Web CLI has communication with your cluster.

Procedure

1. Install Alauda Build of Kueue

Install the Alauda Build of Kueue cluster plugin. See Install for detailed instructions.

Verify the installation:

kubectl get pods -n cpaas-system | grep kueue

2. Configure Kueue resources

Create the required Kueue resources to enable quota management:

  1. Configure ResourceFlavor objects to represent the different node types in your cluster:

    apiVersion: kueue.x-k8s.io/v1beta2
    kind: ResourceFlavor
    metadata:
      name: default-flavor

    For GPU nodes, add node labels and tolerations:

    apiVersion: kueue.x-k8s.io/v1beta2
    kind: ResourceFlavor
    metadata:
      name: gpu-flavor
    spec:
      nodeLabels:
        nvidia.com/gpu.product: Tesla-T4
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule

    See Configuring quotas for more details.

  2. Configure ClusterQueue objects to define resource quotas and admission rules:

    apiVersion: kueue.x-k8s.io/v1beta2
    kind: ClusterQueue
    metadata:
      name: cluster-queue
    spec:
      namespaceSelector: {}
      resourceGroups:
      - coveredResources: ["cpu", "memory", "pods"]
        flavors:
        - name: "default-flavor"
          resources:
          - name: "cpu"
            nominalQuota: 32
          - name: "memory"
            nominalQuota: 128Gi
          - name: "pods"
            nominalQuota: 20
  3. Configure LocalQueue objects in each namespace that requires Kueue-managed workloads:

    apiVersion: kueue.x-k8s.io/v1beta2
    kind: LocalQueue
    metadata:
      namespace: <project-namespace>
      name: <queue-name>
    spec:
      clusterQueue: cluster-queue

3. Set up RBAC

Configure role-based access control for batch administrators and users. See Setup RBAC.

4. Set up project namespaces

For each project namespace that should use Kueue:

  1. Create a project and namespace in Alauda Container Platform.
  2. Switch to Alauda AI, click Namespace Manage in Admin > Management Namespace, and select the previously created namespace to complete the management.
  3. Create a LocalQueue in the namespace pointing to the appropriate ClusterQueue.
  4. (Optional) Create a default local queue to automatically manage all workloads in the namespace. See Managing jobs and label policies.

5. Verify the configuration

  1. Verify the ClusterQueue is active:

    kubectl get clusterqueues

    The ClusterQueue should show as Active.

  2. Verify LocalQueues are connected:

    kubectl get localqueues --all-namespaces
  3. Submit a test workload to verify admission:

    apiVersion: batch/v1
    kind: Job
    metadata:
      generateName: test-job-
      namespace: <project-namespace>
      labels:
        kueue.x-k8s.io/queue-name: <queue-name>
    spec:
      template:
        spec:
          containers:
          - name: test
            image: busybox
            command: ["sh", "-c", "echo 'Kueue admission test succeeded' && sleep 10"]
            resources:
              requests:
                cpu: "100m"
                memory: "128Mi"
          restartPolicy: Never
    kubectl create -f test-job.yaml
  4. Verify the workload was admitted:

    kubectl get workloads -n <project-namespace>
    kubectl get pods -n <project-namespace>

Next steps