Iris Coleman
Oct 14, 2025 16:42
NVIDIA introduces Coherent Driver-based Reminiscence Administration (CDMM) to enhance GPU reminiscence management on hardware-coherent platforms, addressing points confronted by builders and cluster directors.
NVIDIA has launched a brand new reminiscence administration mode, Coherent Driver-based Reminiscence Administration (CDMM), designed to reinforce the management and efficiency of GPU reminiscence on hardware-coherent platforms similar to GH200, GB200, and GB300. This improvement goals to handle the challenges posed by non-uniform reminiscence entry (NUMA), which might result in inconsistent system efficiency when functions usually are not absolutely NUMA-aware, based on NVIDIA.
NUMA vs. CDMM
NUMA mode, the present default for NVIDIA drivers on hardware-coherent platforms, exposes each CPU and GPU reminiscence to the working system (OS). This setup permits reminiscence allocation by way of normal Linux and CUDA APIs, facilitating dynamic reminiscence migration between CPU and GPU. Nonetheless, this may additionally end in GPU reminiscence being handled as a generic pool, doubtlessly affecting utility efficiency negatively.
In distinction, CDMM mode prevents GPU reminiscence from being uncovered to the OS as a software program NUMA node. As an alternative, the NVIDIA driver instantly manages GPU reminiscence, offering extra exact management and doubtlessly boosting utility efficiency. This method is akin to the PCIe-attached GPU mannequin, the place GPU reminiscence stays distinct from system reminiscence.
Implications for Kubernetes
The introduction of CDMM is especially important for Kubernetes, a widely-used platform for managing massive GPU clusters. In NUMA mode, Kubernetes could encounter surprising behaviors, similar to reminiscence over-reporting and incorrect utility of pod reminiscence limits, which might result in efficiency points and utility failures. CDMM mode helps mitigate these points by guaranteeing higher isolation and management over GPU reminiscence.
Impression on Builders and System Directors
For CUDA builders, CDMM mode impacts how system-allocated reminiscence is dealt with. Whereas GPU can nonetheless entry system-allocated reminiscence throughout the NVLink chip-to-chip connection, reminiscence pages is not going to migrate as they could in NUMA mode. This variation requires builders to adapt their reminiscence administration methods to completely leverage the capabilities of CDMM.
System directors will discover that instruments like numactl or mbind are ineffective for GPU reminiscence administration in CDMM mode, as GPU reminiscence is just not introduced to the OS. Nonetheless, these instruments can nonetheless be utilized for managing system reminiscence.
Tips for Selecting Between CDMM and NUMA
When deciding between CDMM and NUMA modes, think about the precise reminiscence administration wants of your functions. NUMA mode is appropriate for functions that depend on OS administration of mixed CPU and GPU reminiscence. In distinction, CDMM mode is good for functions requiring direct GPU reminiscence management, bypassing the OS for enhanced efficiency and management.
In the end, CDMM mode affords builders and directors the flexibility to harness the complete potential of NVIDIA’s hardware-coherent reminiscence architectures, optimizing efficiency for GPU-accelerated workloads. For these utilizing platforms like GH200, GB200, or GB300, enabling CDMM mode might present important advantages, particularly in Kubernetes environments.
Picture supply: Shutterstock
