LLVM IR Syntax
Components and Syntax of IR
- LLVM IR has an assembly-like syntax and is organized at the module level.
- A module can contain multiple global variables and function definitions, each of which is composed of basic blocks.
- The key components of IR are:
- Module
- Represents a single compiled program unit. It includes target information (target triple), data layout, global variables, and function declarations/definitions.
- For example, at the top of an IR file you may see:
target triple = "x86_64-pc-linux-gnu"
- Global Variable
- Variables/constants used throughout the program, named with
@. - Example:
@.str = private constant [14 x i8] c"Hello, world\0A\00", align 1
Here,@.stris the global constant name,[14 x i8]indicates a 14-byte string, andalign 1specifies alignment.
- Variables/constants used throughout the program, named with
- Function Declaration and Definition
- Declared with
declareand defined withdefine. - Example:
declare i32 @printf(ptr, ...)
define i32 @add(i32 %a, i32 %b) { ... } - Functions are written as:
define <return type> @<name>(<parameters>) <attributes> { ... }
- Declared with
- Module
Example
C function:
int add(int a, int b) {
int sum = a + b;
return sum;
}
Generated IR (via clang -O0 -S -emit-llvm code.c -o code.ll):
define i32 @add(i32 %a, i32 %b) {
entry:
%sum = add i32 %a, %b ; add a and b, assign to sum
ret i32 %sum ; return sum
}
define i32 @add(i32 %a, i32 %b)defines a functionaddthat takes twoi32parameters and returns ani32.%a,%bare virtual registers holding argument values.entry:is a basic block label.%sum = add i32 %a, %bis an add instruction storing the result in%sum.ret i32 %sumreturns%sum.
SSA (Static Single Assignment)
- LLVM IR follows SSA form, meaning each value is assigned exactly once.
- Registers (e.g.,
%sum) are unique within a function.
Global Identifiers
- Identifiers starting with
@are global symbols (functions, global variables). - Types (e.g.,
i32,i8*) are explicitly attached to operands and instructions. - Example:
add i32 %a, %brequires both operands to bei32.
Common IR Instructions
- Arithmetic:
add,sub,mul,udiv, etc. - Comparison:
icmp,fcmp(returni1boolean values). - Control flow:
br,switch. - Phi nodes:
phi(for SSA merges). - Memory:
load,store. - Stack/other:
alloca,call,ret.
Example (C: int x = 10; int y = x + 5;, compiled with -O0):
%x = alloca i32, align 4 ; allocate stack space for x
store i32 10, ptr %x, align 4 ; store constant 10 into x
%tmp = load i32, ptr %x, align 4 ; load x into tmp
%y = add i32 %tmp, 5 ; compute y = tmp + 5
In Summary…
- LLVM IR is structured as Module → Function → Basic Block → Instruction.
- Its assembly-like syntax, explicit typing, and SSA form allow precise analysis and optimization during compilation.