Basic Atomic Operations

Atomic Operation

An atomic operation ensures that when many threads simultaneously access the same variable, the accesses are serialized so that race conditions do not occur.
For example, suppose there is a variable called result that stores the sum of all elements in an array.
If multiple threads attempt to update this variable without atomic operations, the order of execution may be interleaved incorrectly, resulting in an inaccurate final value.
Atomic operations are also used in other cases such as counters, histograms, and any scenario where multiple threads must update the same variable.

Code Example

#include <cuda_runtime.h>

// Kernel to sum an array using atomicAdd
__global__ void atomicSumKernel(const float *input, float *result, int N) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    
    // Each thread adds its element to the result using atomicAdd
    if (idx < N) {
        atomicAdd(result, input[idx]);
    }
}

By using the atomicAdd() function, threads update the result variable safely, one at a time.

Drawbacks

Since atomic operations serialize parallel execution, excessive use can lead to bottlenecks.
To mitigate this, optimization strategies can be applied:
- For example, when summing an array, each block can first compute a partial sum using shared memory and then combine results at the end, reducing reliance on atomics.