Configuring borrowing and lending limits
When ClusterQueues belong to the same cohort, they can share borrowable resources. By default, the unused nominal quota of all the ClusterQueues in a cohort is available for borrowing by other ClusterQueues. You can use borrowing and lending limits to control how much each ClusterQueue can borrow from or lend to other ClusterQueues in the cohort.
TOC
Borrowing limitExample: ClusterQueue with borrowing limitLending limitExample: ClusterQueue with lending limitCombined example: Two teams sharing a cohortBorrowing limit
The borrowingLimit field on a resource within a ClusterQueue defines the maximum amount of unused quota that the ClusterQueue can borrow from the cohort. If not set, there is no borrowing limit.
Example: ClusterQueue with borrowing limit
borrowingLimitfor CPU: This ClusterQueue can borrow up to 4 additional CPU cores from the cohort, for a total usage of 12 CPU cores (8 nominal + 4 borrowed).borrowingLimitfor memory: This ClusterQueue can borrow up to 16Gi additional memory, for a total usage of 48Gi.borrowingLimitfor GPU: This ClusterQueue can borrow up to 2 additional GPUs, for a total usage of 4 GPUs.
Lending limit
The lendingLimit field on a resource within a ClusterQueue defines the maximum amount of unused quota that the ClusterQueue is willing to lend to other ClusterQueues in the cohort. If not set, the entire unused quota is available for lending.
Example: ClusterQueue with lending limit
lendingLimitfor CPU: This ClusterQueue will lend at most 8 unused CPU cores to other queues in the cohort, reserving a minimum of 8 CPU cores for itself.lendingLimitfor memory: This ClusterQueue will lend at most 32Gi of unused memory.lendingLimitof 0 for GPU: This ClusterQueue will not lend any GPU resources to other queues, even if its GPUs are unused. This is useful for reserving expensive GPU resources exclusively for a specific team.
Combined example: Two teams sharing a cohort
The following example shows two teams sharing resources within a cohort, with controlled borrowing and lending:
Team A ClusterQueue:
Team B ClusterQueue:
In this configuration:
- Team A has 8 CPU nominal quota, can borrow up to 8 more, and will lend up to 4 when unused.
- Team B has 16 CPU nominal quota, can borrow up to 4 more, and will lend up to 8 when unused.
- GPU lending is tightly controlled: Team A lends at most 1 GPU, Team B lends at most 2 GPUs.
Use borrowing and lending limits to prevent one team from consuming all the shared resources in a cohort. This is especially important for expensive resources like GPUs.