A Basic Block is a straight-line sequence of instructions with a single entry point and a single exit point, meaning there are no branches or joins inside the block.
Specifically:
Entry: control can only enter at the beginning of the block (not in the middle).
Exit: control leaves only at the end of the block, where there is exactly one terminator instruction.
In LLVM IR, basic blocks typically begin with a label (e.g., entry:, if.then:) and must end with a terminator instruction, such as:
br (branch)
ret (return)
switch
Terminators define how control flows to the next block. Instructions in a block execute sequentially until the terminator.
CFG (Control Flow Graph)
An LLVM function is internally represented as a Control Flow Graph (CFG), which is a directed graph of basic blocks.
In a CFG:
Nodes represent basic blocks.
Edges represent control transfers (branches) between blocks.
Example: in an if statement, we might have edges like:
entry -> if.then
entry -> if.else
if.then -> if.end
if.else -> if.end
Graphically, this often forms a diamond shape in the CFG.
Why is CFG Important?
Understanding program flow via CFG is crucial for compiler optimizations and static analysis.
Examples:
Loops appear as cycles in the CFG (edges that return to the same block).
Conditionals appear as branching nodes.
The compiler uses CFG for:
Dominator analysis: determining whether a block must be executed before another.
Reachability analysis: determining which blocks can be executed.
This information drives optimizations such as dead code elimination, loop unrolling, and code motion.
CFG Visualization
LLVM provides tools to visualize CFGs:
opt -dot-cfg file.ll → outputs Graphviz .dot files for each function.
opt -view-cfg file.ll → opens an interactive visualization window.
Note: after optimizations, the CFG may look very different:
Blocks may be merged or removed.
Control paths may be simplified.
In Summary…
A Basic Block is a straight-line sequence of instructions with no internal branches.
Multiple basic blocks together form the Control Flow Graph (CFG), which represents the control flow of a function.
The CFG is a fundamental structure used by compilers to understand program behavior and apply flow-based optimizations.
When reading LLVM IR, it’s useful to think in terms of CFG: each function is a set of basic blocks connected into a graph.