Debug Information

  • Debug information refers to metadata that records the correspondence between generated machine code and the original source code.
  • When compiled with the -g flag, the compiler produces extra information such as variable names, source line numbers, and scope information, which are stored in a standard debugging format like DWARF.
  • At the LLVM IR level, this debug info appears as metadata embedded in the IR.
  • Metadata provides additional information for compilers and debuggers, while not affecting program execution.
  • In LLVM IR, metadata is denoted using the ! (exclamation mark) syntax, either attached to instructions or defined as metadata nodes.

IR Metadata

  • Metadata in LLVM IR is optional supplementary information that has no effect on program behavior.
  • The guiding principle: compiler optimizations and code generation must not change due to debug info.
  • Optimizations run the same way whether or not metadata exists; debug info only serves as a mapping for debugging tools.

Example 1

  • With -g, Clang emits IR where instructions are annotated with !dbg !N entries, and !DIxxx nodes at the bottom describe file names, variables, etc.
store i32 %val, ptr %ptr, align 4, !dbg !15
!15 = !DILocation(line: 42, column: 5, scope: !8)
  • This indicates that the store corresponds to line 42, column 5 in the original source code.
  • Every IR instruction can thus be traced back to its source origin.

Example 2

  • Local variable debug info can be represented via debug intrinsics such as llvm.dbg.declare or llvm.dbg.value:
call void @llvm.dbg.declare(metadata ptr %x.addr, metadata !11, metadata !DIExpression()), !dbg !15
  • And a corresponding metadata node:
!11 = !DILocalVariable(name: "x", ... )
  • This allows debuggers (e.g., GDB, LLDB) to display the original source variable names and types.
  • Later LLVM versions have moved towards storing this as metadata records rather than explicit intrinsics.

Other Uses of Metadata

  • Beyond debug info, metadata provides optimization hints:
    • !range → specifies possible value ranges (e.g., 0 or 1).
    • !tbaa (Type-Based Alias Analysis) → informs aliasing rules for loads/stores, enabling memory access reordering.
    • !llvm.loop → loop hints (e.g., unrolling, vectorization).
    • !prof → branch probability profiling data.
  • Metadata is optional but, when present, enables better optimizations and more informative debugging.

Stripping Metadata

  • To remove metadata from LLVM IR:
    • opt -strip-debug → removes debug metadata only.
    • opt -strip → removes all symbolic names and metadata.
  • This can simplify IR for human inspection, though at the cost of losing source-level correspondence.

In Summary…

  • Metadata in LLVM IR provides optional information for debugging and optimization without affecting program execution.
  • Debug info is a primary example, mapping IR instructions back to source code.
  • Other forms of metadata serve as hints to optimizers.
  • Understanding metadata helps in interpreting why certain annotations appear in IR and how they influence tools and optimization pipelines.

Updated: