Intro to CUDA
Introduction to CUDA
GPU vs. CPU
- The GPU is a chipset specialized for parallel processing.
- A GPU has far more cores than a CPU, but the performance of a single GPU core is much weaker compared to a CPU core.
- However, by executing a large number of cores in parallel, the GPU significantly accelerates simple computations.
- In contrast, the CPU excels in single-core performance and demonstrates high efficiency in handling complex computations with fewer cores.
Concepts of CUDA
Model Structure
- Thread: A thread is the smallest unit of computation in parallel processing. Each thread executes the same instruction but works with different data. This approach is called SIMT (Single Instruction Multiple Thread).
- Block: A block is a group of threads that share the same memory (also referred to as Shared Memory).
- Grid: A grid is a collection of blocks required to perform a computation.
Memory Structure
- Global Memory: The largest memory unit accessible by all threads, but relatively slow in access speed.
- Shared Memory: Memory allocated per block, accessible by all threads within that block.
- Local Memory: Memory allocated individually to each thread.
- Constant & Texture Memory: Used to store read-only data and cache data.
Kernel Functions
- Every kernel executed on the GPU must include the
__global__tag. - To reference blocks and threads, the syntax
<<< >>>is used.