Chapter 4: Adding Optimizer and JIT Support

In Chapter 4 we improve the quality of generated code by running optimization passes and we make the language interactive by adding a JIT compiler.


1. Trivial constant folding (already for free)

IRBuilder folds obvious constants while you emit IR. For example, parsing def test(x) 1+2+x does not emit an add 1.0, 2.0 instruction; instead it emits a single x + 3.0 since the builder folded 1+2 up front. This is automatic and requires no AST‑level special cases.

When expressions become more complex, local folding is not enough (e.g., (1+2+x)*(x+(1+2)) wants CSE + reassociation). That is where the optimization pipeline comes in.


2. Per‑function optimization pipeline

We set up a FunctionPassManager and add a handful of standard cleanup passes. In modern LLVM this also requires analysis managers and pass instrumentation.

// Globals created alongside TheContext/TheModule/Builder.
static std::unique_ptr<llvm::FunctionPassManager> TheFPM;
static std::unique_ptr<llvm::LoopAnalysisManager> TheLAM;
static std::unique_ptr<llvm::FunctionAnalysisManager> TheFAM;
static std::unique_ptr<llvm::CGSCCAnalysisManager> TheCGAM;
static std::unique_ptr<llvm::ModuleAnalysisManager> TheMAM;
static std::unique_ptr<llvm::PassInstrumentationCallbacks> ThePIC;
static std::unique_ptr<llvm::StandardInstrumentations> TheSI;

void InitializeModuleAndManagers() {
  // Open a new context and module.
  TheContext = std::make_unique<LLVMContext>();
  TheModule  = std::make_unique<Module>("KaleidoscopeJIT", *TheContext);
  TheModule->setDataLayout(TheJIT->getDataLayout());

  // Create a new builder for the module.
  Builder = std::make_unique<IRBuilder<>>(*TheContext);

  // Create new pass and analysis managers.
  TheFPM = std::make_unique<FunctionPassManager>();
  TheLAM = std::make_unique<LoopAnalysisManager>();
  TheFAM = std::make_unique<FunctionAnalysisManager>();
  TheCGAM = std::make_unique<CGSCCAnalysisManager>();
  TheMAM = std::make_unique<ModuleAnalysisManager>();
  ThePIC = std::make_unique<PassInstrumentationCallbacks>();
  TheSI  = std::make_unique<StandardInstrumentations>(*TheContext, /*DebugLogging=*/true);
  TheSI->registerCallbacks(*ThePIC, TheMAM.get());

  // Add transform passes: a compact, effective cleanup pipeline.
  TheFPM->addPass(InstCombinePass());     // peephole & bit‑twiddling
  TheFPM->addPass(ReassociatePass());     // reassociate exprs
  TheFPM->addPass(GVNPass());             // common subexpr elim
  TheFPM->addPass(SimplifyCFGPass());     // remove dead branches, etc.

  // Register analyses used by these transforms.
  PassBuilder PB;
  PB.registerModuleAnalyses(*TheMAM);
  PB.registerFunctionAnalyses(*TheFAM);
  PB.crossRegisterProxies(*TheLAM, *TheFAM, *TheCGAM, *TheMAM);
}
  • InstCombine, Reassociate, GVN, SimplifyCFG are a good baseline for on‑the‑fly per‑function optimization.
  • We call this pipeline after producing a function’s body in FunctionAST::codegen:
// After verifyFunction(*TheFunction);
TheFPM->run(*TheFunction, *TheFAM);

With this, (1+2+x)*(x+(1+2)) reduces to the expected tmp = x+3; res = tmp*tmp;.


3. Adding a JIT (ORC, via KaleidoscopeJIT)

The tutorial uses a small wrapper class KaleidoscopeJIT from LLVM’s examples, built on the ORC JIT APIs. We initialize native target support, construct the JIT, and make sure our Module uses the JIT’s data layout.

3.1 Program setup

static std::unique_ptr<KaleidoscopeJIT> TheJIT;

int main() {
  InitializeNativeTarget();
  InitializeNativeTargetAsmPrinter();
  InitializeNativeTargetAsmParser();

  // Precedence table for our language.
  BinopPrecedence['<'] = 10;
  BinopPrecedence['+'] = 20;
  BinopPrecedence['-'] = 20;
  BinopPrecedence['*'] = 40;

  fprintf(stderr, "ready> ");
  getNextToken();

  TheJIT = std::make_unique<KaleidoscopeJIT>();

  MainLoop();
  return 0;
}

We also ensure InitializeModuleAndManagers() sets the DataLayout from the JIT: TheModule->setDataLayout(TheJIT->getDataLayout());.

3.2 JIT‑compile and run top‑level expressions

Top‑level expressions are compiled into an anonymous function __anon_expr. To evaluate:

  1. Move the current Module into a ThreadSafeModule and addModule to the JIT (optionally track with a ResourceTracker).
  2. lookup("__anon_expr") to get a symbol, cast to a function pointer, and call it.
  3. Remove the temporary module from the JIT (via the tracker) to free memory.
static ExitOnError ExitOnErr;

static void HandleTopLevelExpression() {
  if (auto FnAST = ParseTopLevelExpr()) {
    if (FnAST->codegen()) {
      auto RT  = TheJIT->getMainJITDylib().createResourceTracker();
      auto TSM = ThreadSafeModule(std::move(TheModule), std::move(TheContext));
      ExitOnErr(TheJIT->addModule(std::move(TSM), RT));
      InitializeModuleAndManagers(); // open a fresh Module for more input

      auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));
      double (*FP)() = ExprSymbol.getAddress().toPtr<double (*)()>();
      fprintf(stderr, "Evaluated to %f\n", FP());

      ExitOnErr(RT->remove()); // unload the temporary module
    }
  } else {
    getNextToken(); // error recovery
  }
}

This makes the REPL actually execute what the user types: definitions persist in the JIT, and bare expressions are compiled and evaluated immediately.


4. Putting it together: workflow

  • User types a function or expression.
  • Parser/AST builds the tree (Chapter 2).
  • Codegen lowers to LLVM IR (Chapter 3).
  • Optimizer runs the FPM on the new function (this chapter).
  • JIT:
    • For def or extern: the function/prototype becomes available to later code.
    • For a top‑level expression: compile to __anon_expr, run it, print the result, then unload the temporary module.

5. Pitfalls and checks

  • Always set the DataLayout of the Module from the JIT to match the host ABI. citeturn2view0
  • Remember to verify functions before running passes and JITing.
  • Keep optimizations per‑function for the REPL scenario; for an offline compiler you could run module‑level passes after parsing the whole file.
  • When JIT‑executing top‑level expressions, use a ResourceTracker so you can remove the temporary module cleanly.

6. Example before/after optimization

Source:

def test(x) (1+2+x)*(x+(1+2));

Unoptimized shape (conceptually):

t0 = (1+2)           ; folded locally to 3.0 by IRBuilder for each side
t1 = t0 + x
t2 = x + t0
t3 = t1 * t2

After FPM pipeline:

t0 = x + 3.0         ; reassociate identical adds
t1 = t0 * t0         ; CSE removes duplicate add

Matches the tutorial’s optimized IR.


Summary

Chapter 4 upgrades Kaleidoscope from “IR printer” to a simple optimizing JIT:

  • Local simplifications via IRBuilder,
  • A small but effective per‑function pass pipeline, and
  • ORC JIT integration to execute top‑level expressions on the fly.

This sets the stage for control‑flow features in subsequent chapters while keeping compilation fast and interactive.

Updated: