Basic Block

  • A Basic Block is a straight-line sequence of instructions with a single entry point and a single exit point, meaning there are no branches or joins inside the block.
  • Specifically:
    • Entry: control can only enter at the beginning of the block (not in the middle).
    • Exit: control leaves only at the end of the block, where there is exactly one terminator instruction.
  • In LLVM IR, basic blocks typically begin with a label (e.g., entry:, if.then:) and must end with a terminator instruction, such as:
    • br (branch)
    • ret (return)
    • switch
  • Terminators define how control flows to the next block. Instructions in a block execute sequentially until the terminator.

CFG (Control Flow Graph)

  • An LLVM function is internally represented as a Control Flow Graph (CFG), which is a directed graph of basic blocks.
  • In a CFG:
    • Nodes represent basic blocks.
    • Edges represent control transfers (branches) between blocks.
  • Example: in an if statement, we might have edges like:
    • entry -> if.then
    • entry -> if.else
    • if.then -> if.end
    • if.else -> if.end
  • Graphically, this often forms a diamond shape in the CFG.

Why is CFG Important?

  • Understanding program flow via CFG is crucial for compiler optimizations and static analysis.
  • Examples:
    • Loops appear as cycles in the CFG (edges that return to the same block).
    • Conditionals appear as branching nodes.
  • The compiler uses CFG for:
    • Dominator analysis: determining whether a block must be executed before another.
    • Reachability analysis: determining which blocks can be executed.
  • This information drives optimizations such as dead code elimination, loop unrolling, and code motion.

CFG Visualization

  • LLVM provides tools to visualize CFGs:
    • opt -dot-cfg file.ll → outputs Graphviz .dot files for each function.
    • opt -view-cfg file.ll → opens an interactive visualization window.
  • Note: after optimizations, the CFG may look very different:
    • Blocks may be merged or removed.
    • Control paths may be simplified.

In Summary…

  • A Basic Block is a straight-line sequence of instructions with no internal branches.
  • Multiple basic blocks together form the Control Flow Graph (CFG), which represents the control flow of a function.
  • The CFG is a fundamental structure used by compilers to understand program behavior and apply flow-based optimizations.
  • When reading LLVM IR, it’s useful to think in terms of CFG: each function is a set of basic blocks connected into a graph.

Updated: