Kaleidoscope Ch 4 : Optimizer & JIT
Chapter 4: Adding Optimizer and JIT Support
In Chapter 4 we improve the quality of generated code by running optimization passes and we make the language interactive by adding a JIT compiler.
1. Trivial constant folding (already for free)
IRBuilder folds obvious constants while you emit IR. For example, parsing def test(x) 1+2+x does not emit an add 1.0, 2.0 instruction; instead it emits a single x + 3.0 since the builder folded 1+2 up front. This is automatic and requires no AST‑level special cases.
When expressions become more complex, local folding is not enough (e.g., (1+2+x)*(x+(1+2)) wants CSE + reassociation). That is where the optimization pipeline comes in.
2. Per‑function optimization pipeline
We set up a FunctionPassManager and add a handful of standard cleanup passes. In modern LLVM this also requires analysis managers and pass instrumentation.
// Globals created alongside TheContext/TheModule/Builder.
static std::unique_ptr<llvm::FunctionPassManager> TheFPM;
static std::unique_ptr<llvm::LoopAnalysisManager> TheLAM;
static std::unique_ptr<llvm::FunctionAnalysisManager> TheFAM;
static std::unique_ptr<llvm::CGSCCAnalysisManager> TheCGAM;
static std::unique_ptr<llvm::ModuleAnalysisManager> TheMAM;
static std::unique_ptr<llvm::PassInstrumentationCallbacks> ThePIC;
static std::unique_ptr<llvm::StandardInstrumentations> TheSI;
void InitializeModuleAndManagers() {
// Open a new context and module.
TheContext = std::make_unique<LLVMContext>();
TheModule = std::make_unique<Module>("KaleidoscopeJIT", *TheContext);
TheModule->setDataLayout(TheJIT->getDataLayout());
// Create a new builder for the module.
Builder = std::make_unique<IRBuilder<>>(*TheContext);
// Create new pass and analysis managers.
TheFPM = std::make_unique<FunctionPassManager>();
TheLAM = std::make_unique<LoopAnalysisManager>();
TheFAM = std::make_unique<FunctionAnalysisManager>();
TheCGAM = std::make_unique<CGSCCAnalysisManager>();
TheMAM = std::make_unique<ModuleAnalysisManager>();
ThePIC = std::make_unique<PassInstrumentationCallbacks>();
TheSI = std::make_unique<StandardInstrumentations>(*TheContext, /*DebugLogging=*/true);
TheSI->registerCallbacks(*ThePIC, TheMAM.get());
// Add transform passes: a compact, effective cleanup pipeline.
TheFPM->addPass(InstCombinePass()); // peephole & bit‑twiddling
TheFPM->addPass(ReassociatePass()); // reassociate exprs
TheFPM->addPass(GVNPass()); // common subexpr elim
TheFPM->addPass(SimplifyCFGPass()); // remove dead branches, etc.
// Register analyses used by these transforms.
PassBuilder PB;
PB.registerModuleAnalyses(*TheMAM);
PB.registerFunctionAnalyses(*TheFAM);
PB.crossRegisterProxies(*TheLAM, *TheFAM, *TheCGAM, *TheMAM);
}
InstCombine,Reassociate,GVN,SimplifyCFGare a good baseline for on‑the‑fly per‑function optimization.- We call this pipeline after producing a function’s body in
FunctionAST::codegen:
// After verifyFunction(*TheFunction);
TheFPM->run(*TheFunction, *TheFAM);
With this, (1+2+x)*(x+(1+2)) reduces to the expected tmp = x+3; res = tmp*tmp;.
3. Adding a JIT (ORC, via KaleidoscopeJIT)
The tutorial uses a small wrapper class KaleidoscopeJIT from LLVM’s examples, built on the ORC JIT APIs. We initialize native target support, construct the JIT, and make sure our Module uses the JIT’s data layout.
3.1 Program setup
static std::unique_ptr<KaleidoscopeJIT> TheJIT;
int main() {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
// Precedence table for our language.
BinopPrecedence['<'] = 10;
BinopPrecedence['+'] = 20;
BinopPrecedence['-'] = 20;
BinopPrecedence['*'] = 40;
fprintf(stderr, "ready> ");
getNextToken();
TheJIT = std::make_unique<KaleidoscopeJIT>();
MainLoop();
return 0;
}
We also ensure InitializeModuleAndManagers() sets the DataLayout from the JIT: TheModule->setDataLayout(TheJIT->getDataLayout());.
3.2 JIT‑compile and run top‑level expressions
Top‑level expressions are compiled into an anonymous function __anon_expr. To evaluate:
- Move the current
Moduleinto aThreadSafeModuleandaddModuleto the JIT (optionally track with aResourceTracker). lookup("__anon_expr")to get a symbol, cast to a function pointer, and call it.- Remove the temporary module from the JIT (via the tracker) to free memory.
static ExitOnError ExitOnErr;
static void HandleTopLevelExpression() {
if (auto FnAST = ParseTopLevelExpr()) {
if (FnAST->codegen()) {
auto RT = TheJIT->getMainJITDylib().createResourceTracker();
auto TSM = ThreadSafeModule(std::move(TheModule), std::move(TheContext));
ExitOnErr(TheJIT->addModule(std::move(TSM), RT));
InitializeModuleAndManagers(); // open a fresh Module for more input
auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));
double (*FP)() = ExprSymbol.getAddress().toPtr<double (*)()>();
fprintf(stderr, "Evaluated to %f\n", FP());
ExitOnErr(RT->remove()); // unload the temporary module
}
} else {
getNextToken(); // error recovery
}
}
This makes the REPL actually execute what the user types: definitions persist in the JIT, and bare expressions are compiled and evaluated immediately.
4. Putting it together: workflow
- User types a function or expression.
- Parser/AST builds the tree (Chapter 2).
- Codegen lowers to LLVM IR (Chapter 3).
- Optimizer runs the FPM on the new function (this chapter).
- JIT:
- For
deforextern: the function/prototype becomes available to later code. - For a top‑level expression: compile to
__anon_expr, run it, print the result, then unload the temporary module.
- For
5. Pitfalls and checks
- Always set the DataLayout of the Module from the JIT to match the host ABI. citeturn2view0
- Remember to verify functions before running passes and JITing.
- Keep optimizations per‑function for the REPL scenario; for an offline compiler you could run module‑level passes after parsing the whole file.
- When JIT‑executing top‑level expressions, use a ResourceTracker so you can remove the temporary module cleanly.
6. Example before/after optimization
Source:
def test(x) (1+2+x)*(x+(1+2));
Unoptimized shape (conceptually):
t0 = (1+2) ; folded locally to 3.0 by IRBuilder for each side
t1 = t0 + x
t2 = x + t0
t3 = t1 * t2
After FPM pipeline:
t0 = x + 3.0 ; reassociate identical adds
t1 = t0 * t0 ; CSE removes duplicate add
Matches the tutorial’s optimized IR.
Summary
Chapter 4 upgrades Kaleidoscope from “IR printer” to a simple optimizing JIT:
- Local simplifications via
IRBuilder, - A small but effective per‑function pass pipeline, and
- ORC JIT integration to execute top‑level expressions on the fly.
This sets the stage for control‑flow features in subsequent chapters while keeping compilation fast and interactive.