2025 ((better)) | Cuda 12.6 News December
December 2025 marks the quiet death of the nvcc command line for 90% of users. NVIDIA’s cuda-python (version 12.6.3) now supports that are indistinguishable from Python native functions, including full support for Python 3.13's subinterpreters.
The library (backported to 12.6 in Q3) now includes automatic tensor memory clustering. What does that mean? Developers writing custom attention mechanisms no longer need to hardcode TMA (Tensor Memory Accelerator) instructions. The compiler infers them. In the latest MLPerf submissions from mid-December, systems running CUDA 12.6 showed a 7-9% latency improvement on Llama-4-70B inference compared to the launch driver of 12.6 from 2024, purely from driver-level JIT optimizations. cuda 12.6 news december 2025
: By December 2025, the toolkit reached Update 3 (12.6.3) , focusing on long-term stability and compatibility with newer compilers like Visual Studio 2022 . December 2025 marks the quiet death of the
Released in late 2024, CUDA 12.6 entered 2025 with a whimper. It leaves 2025 with a roar. Here is the state of play for NVIDIA’s moat this December. What does that mean
CUDA 12.6 initially introduced several tectonic shifts in how NVIDIA manages GPU software, most notably the transition to . 🔧 Key Technical Milestones
As one infrastructure engineer at a FAANG lab (speaking anonymously) told us: "We turned off our custom graph scheduler last month. The runtime scheduler in 12.6 is now better than what we spent three years building."
The biggest news this December isn't a new feature, but a deprecation . With NVIDIA’s Grace CPU now shipping in volume for supercomputers (El Capitan’s successors and new EU exascale projects), CUDA 12.6 has officially moved nvcc to a .