LLVM IR (Intermediate Representation) is the core of the LLVM compiler, serving as the central intermediate language for all optimizations and code generation. It is a low-level assembly-like language that is still human-readable, abstracting away the details of CPUs and source languages.
For example, IR is essentially a strongly-typed, RISC-like virtual assembly language, capable of using an unlimited number of virtual registers, unconstrained by real hardware limits.
This allows optimizations and analyses to be performed without worrying about the number or size of registers on real CPUs.
While real CPUs have a limited number of registers, LLVM IR can create as many virtual registers as needed, enabling target-independent optimization.
Structure of IR
LLVM IR can exist in three different forms:
A human-readable assembly form (.ll file)
An in-memory IR object used internally by the compiler
A binary bitcode form (.bc file), for storage or transmission
Developers usually interact with the .ll assembly form when inspecting or modifying IR, while compilers internally use the in-memory or bitcode forms for efficient processing.
Key Characteristics of IR
The most important traits of LLVM IR are platform independence and language independence.
IR code generated by different front-ends (C, C++, Rust, Python, etc.) follows a common set of rules and instructions, allowing the Optimizer to apply the same optimizations uniformly.
Back-ends for architectures like x86, ARM, and RISC-V consume the IR and generate machine code with minimal target dependency.
However, full independence is not always possible; for example, calling conventions may introduce slight target dependencies even at the IR level.
Still, IR abstracts away as much as possible, enabling LLVM to approach a “write once, run anywhere” philosophy.
In Summary…
LLVM IR is the backbone of LLVM, representing code produced by front-ends, serving as the medium for optimizations, and providing the foundation for machine code generation in the back-end.
Understanding IR is crucial for understanding LLVM itself. The ability to read and manipulate IR gives direct insight into what transformations and optimizations occur during the compilation process.