Back to GPU Glossary
GPU Glossary

CUDA

NVIDIA's parallel computing platform and programming model that enables developers to use NVIDIA GPUs for general-purpose computing, including AI training and inference.

CUDA (Compute Unified Device Architecture) is a parallel computing platform and API created by NVIDIA. It allows software developers to write programs that execute on NVIDIA GPUs, harnessing their massive parallelism for general-purpose computation. CUDA provides extensions to standard programming languages like C, C++, and Python, along with a runtime library and driver that manage GPU execution.

Since its introduction in 2007, CUDA has become the dominant platform for GPU-accelerated computing. Virtually all major deep learning frameworks — PyTorch, TensorFlow, JAX — use CUDA as their primary GPU backend. The CUDA ecosystem includes cuDNN for optimized deep learning primitives, cuBLAS for linear algebra, NCCL for multi-GPU communication, and TensorRT for inference optimization, forming a comprehensive stack for AI workloads.

A CUDA program organizes work into kernels, which are functions that execute in parallel across thousands of GPU threads. Threads are grouped into blocks, and blocks are organized into a grid. The CUDA runtime handles scheduling these thread blocks onto the GPU's streaming multiprocessors (SMs). Developers control parallelism at a high level, while the hardware handles fine-grained scheduling and execution.

For most AI practitioners, CUDA operates invisibly beneath the frameworks they use. When you call a PyTorch operation like torch.matmul on GPU tensors, PyTorch dispatches optimized CUDA kernels that execute on the GPU. The framework handles memory allocation, data transfer between CPU and GPU, kernel selection, and synchronization. However, understanding CUDA basics helps when debugging performance issues or evaluating GPU hardware.

CUDA's dominance in the AI ecosystem creates a strong lock-in effect for NVIDIA GPUs. While alternatives like AMD's ROCm and Intel's oneAPI exist, CUDA's mature tooling, extensive library ecosystem, and broad framework support make it the default choice for AI infrastructure. Serverless GPU platforms like Cumulus abstract away CUDA management entirely, handling driver versions, library compatibility, and runtime configuration as part of the platform infrastructure.