When I first encountered MLIR, I assumed it was simply “LLVM, but adapted for machine learning.” That’s a common first impression — and it’s misleading.

LLVM is a compiler infrastructure centered around a single, low-level intermediate representation. It unified frontends and backends by providing a common optimization and code generation layer. But LLVM IR is intentionally minimal and low-level. Control flow is expressed through basic blocks. Structured loops are flattened. High-level semantics disappear quickly.

MLIR exists because that design, while powerful, is insufficient for modern workloads — especially tensor programs, graph-based computations, and domain-specific transformations.

The key insight behind MLIR is not that we need a better IR. It is that we need multiple IRs, each preserving the right level of abstraction at the right time.

The Structural Problem LLVM Couldn’t Solve

LLVM is extremely good at optimizing scalar programs and generating machine code. But imagine you want to perform polyhedral loop transformations, tensor fusion, or structured scheduling. These optimizations rely on structured information — loop boundaries, affine expressions, tensor shapes.

By the time code reaches LLVM IR, that structure is already lost.

Original C File

int sum_prefix(int *a, int n) {
  int s = 0;
  for (int i = 0; i < n; i++) {
    s += a[i];
  }
  return s;
}

Converted LLVM IR

define i32 @sum_prefix(ptr %a, i32 %n) {
entry:
  br label %loop.header

loop.header:
  %i = phi i32 [ 0, %entry ], [ %i.next, %loop.latch ]
  %s = phi i32 [ 0, %entry ], [ %s.next, %loop.latch ]

  ; loop condition: i < n ?
  %cond = icmp slt i32 %i, %n
  br i1 %cond, label %loop.body, label %exit

loop.body:
  ; load a[i]
  %idx = sext i32 %i to i64
  %elem.ptr = getelementptr i32, ptr %a, i64 %idx
  %elem = load i32, ptr %elem.ptr

  ; s += a[i]
  %s.next = add i32 %s, %elem
  br label %loop.latch

loop.latch:
  ; i++
  %i.next = add i32 %i, 1
  br label %loop.header

exit:
  ret i32 %s
}

You can reconstruct it, but it’s painful and brittle.

MLIR changes the timeline. Instead of lowering everything immediately into a flat IR, it allows high-level representations to coexist with lower-level ones. A tensor operation can remain a tensor operation long enough to be optimized meaningfully before being lowered into loops. An affine loop can remain affine until you’ve exhausted algebraic and structural transformations.

This is why MLIR is not just “another IR.” It is a framework for defining IRs at multiple abstraction levels.

Dialects: IR as a Modular Concept

In MLIR, there is no single universal IR. Instead, there are dialects.

A dialect defines its own types, operations, invariants, and semantics. It is effectively a self-contained language embedded within MLIR’s infrastructure.

This modularity is not cosmetic. It changes how optimization works.

In LLVM, every optimization pass must be correct for all IR. In MLIR, passes can target specific dialects. A pass that manipulates affine loops doesn’t need to understand tensor semantics. A pass that rewrites polynomial arithmetic doesn’t need to understand GPU kernels.

mlir_diagram
General MLIR Diagram

Optimization becomes scoped and composable.

This design dramatically lowers the barrier for building domain-specific compilers. You no longer need to fork LLVM or write a monolithic optimizer. You define a dialect, declare its semantics, and write transformations that operate only within that semantic space.

Progressive Lowering: Abstraction as a Resource

The most important conceptual shift in MLIR is progressive lowering.

Traditional compilation looks like this:

High-level language → LLVM IR → Machine code

MLIR replaces that linear pipeline with a layered one:

High-level dialect → mid-level dialect → ... → lower-level dialect → LLVM dialect → machine code

Lowering is explicit and incremental.

Each layer preserves just enough structure for the optimizations that belong at that layer. Once you lower too far, certain transformations become either impossible or extremely expensive to express.

This idea — that abstraction is a resource to be preserved until no longer needed — is the philosophical core of MLIR.

Two Ways to Transform IR

When you begin writing passes, you quickly encounter two distinct mental models.

The first is explicit traversal. You walk the IR tree and mutate operations directly. This feels familiar if you’ve worked on traditional compilers. It gives you complete control and allows global reasoning, such as common subexpression elimination or whole-function analysis.

#include "lib/Transform/Affine/AffineFullUnroll.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Affine/LoopUtils.h"
#include "mlir/include/mlir/Pass/Pass.h"

using mlir::affine::AffineForOp;
using mlir::affine::loopUnrollFull;

void AffineFullUnroll::runOnOperation() {
	getOperation().walk([&](AffineForOp op) {
		if (failed(loopUnrollFull(op))) {
			op.emitError("unrolling failed");
			signalPassFailure();
		}
	})
}

Explicit AST walking example on the optimization pass

The second model is pattern rewriting. Instead of scanning the IR manually, you declare rewrite rules: whenever a certain shape appears, replace it with another shape. The rewrite engine applies these patterns greedily until no more matches exist.

#include "lib/Transform/Arith/MulToAdd.h"
#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/IR/PatternMatch.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "mlir/include/mlir/Pass/Pass.h"

using arith::AddIOp;
using arith::ConstantOp;
using arith::MulIOp;

//define pattern
//inherit from OpRewritePattern and anchors to MulIOp op.
struct PowerOfTwoExpand :
	public OpRewritePattern<MulIOp> {
	// inherits from OpRewritePattern with context and benefit
	PowerOfTwoExpand(mlir::MLIRContext *context) 
		: OpRewritePattern<MulIOp>(context, 2){}
		
	LogicalResult matchAndRewrite(MulIOp op, PatternRewriter &rewriter) const override 
	{
		Value lhs = op.getOperand(0); // variable
		Value rhs = op.getOperand(1); // constant
		
		auto rhsDefiningOp = rhs.getDefiningOp<arith::ConstantIntOp>();
		if(!rhsDefiningOp) {
			return failure();
		}
		
		// may also use cast
		int64_t value = rhsDefiningOp.value();
		bool is_power_of_two = (value & (value - 1)) == 0;
		if (!is_power_of_two) {
			return failure();
		}
		
		ConstantOp newConstant = rewriter.create<ConstantOp>(
			rhsDefiningOp.getLoc(), rewriter.getIntegerAttr(rhs.getType(), value / 2)
		);
		MulIOp newMul = rewriter.create<MulIOp>(op.getLoc(), lhs, newConstant);
		AddIOp newAdd = rewriter.create<AddIOp>(op.getLoc(), newMul, newMul);
		
		rewriter.replaceOp(op, {newAdd});
		rewriter.eraseOp(rhsDefiningOp);
		
		return success();
	}
};

struct PeelFromMul :
	public OpRewritePattern<MulIOp> {
	PeelFromMul(mlir::MLIRContext *context)
		: OpRewritePattern<MulIOp>(context, 1){}
	
	LogicalResult matchAndRewrite(MulIOp op, PatternRewriter &rewriter) const override
	{
		Value lhs = op.getOperand(0); // variable
		Value rhs = op.getOperand(1); // constant
		
		auto rhsDefiningOp = rhs.getDefiningOp<arith::ConstantIntOp>();
		if(!rhsDefiningOp) {
			return failure();
		}
		
		// may also use cast
		int64_t value = rhsDefiningOp.value();
		
		ConstantOp newConstant = rewriter.create<ConstantOp>(
			rhsDefiningOp.getLoc(), rewriter.getIntegerAttr(rhs.getType(), value - 1)
		);
		MulIOp newMul = rewriter.create<MulIOp>(op.getLoc(), lhs, newConstant);
		AddIOp newAdd = rewriter.create<AddIOp>(op.getLoc(), newMul, lhs);
		
		rewriter.replaceOp(op, {newAdd});
		rewriter.eraseOp(rhsDefiningOp);
		
		return success();
	}
};

// now override runOnOperation() in order to write full logic of this pass
void MulToAddPass::runOnOperation() {
	//prepare a pattern set
	mlir::RewritePatternSet patterns(&getContext());
	// add PowerOfTwoExpand pattern to pattern set
	patterns.add<PowerOfTwoExpand>(&getContext());
	// add PeelFromMul pattern to pattern set
	patterns.add<PeelFromMul>(&getContext());
	// run rewrite engine greedily (looking at benefit)
	(void)applyPatternsGreedily(getOperation(), std::move(patterns));
}

Pattern rewriting example on the optimization pass

These two approaches reflect different transformation styles. Explicit walking is well-suited for global structural changes. Pattern rewriting is ideal for local algebraic simplifications.

Understanding when to use each is less about API knowledge and more about understanding the nature of your transformation.

TableGen: Power with Opacity

MLIR introduced TableGen as a domain-specific language (DSL) to reduce boilerplate when defining dialects, types, and operations. TableGen can feel magical at first. You define an operation declaratively, and MLIR generates builders, verifiers, type inference logic, and registration code automatically.

include "mlir/Pass/PassBase.td"

def AffineFullUnroll : Pass<"affine-full-unroll"> {
	let summary = "Fully unroll all affine loops";
	let description = [{
		"Fully unroll all affine loops."
	}];
	let dependentDialects = ["mlir::affine::AffineDialect"];
}

LoopUnrolling TableGen Example

#ifdef GEN_PASS_DEF_AFFINEFULLUNROLL

namespace impl {
  std::unique_ptr<::mlir::Pass> createAffineFullUnroll();
} // namespace impl
namespace impl {

template <typename DerivedT>
class AffineFullUnrollBase : public ::mlir::OperationPass<> {
public:
  using Base = AffineFullUnrollBase;

  AffineFullUnrollBase() : ::mlir::OperationPass<>(::mlir::TypeID::get<DerivedT>()) {}
  AffineFullUnrollBase(const AffineFullUnrollBase &other) : ::mlir::OperationPass<>(other) {}
  AffineFullUnrollBase& operator=(const AffineFullUnrollBase &) = delete;
  AffineFullUnrollBase(AffineFullUnrollBase &&) = delete;
  AffineFullUnrollBase& operator=(AffineFullUnrollBase &&) = delete;
  ~AffineFullUnrollBase() = default;

  /// Returns the command-line argument attached to this pass.
  static constexpr ::llvm::StringLiteral getArgumentName() {
    return ::llvm::StringLiteral("affine-full-unroll");
  }
  ::llvm::StringRef getArgument() const override { return "affine-full-unroll"; }

  ::llvm::StringRef getDescription() const override { return "Fully unroll all affine loops"; }

// More Codes
// ...
// ...

std::unique_ptr<::mlir::Pass> createAffineFullUnroll() {
  return impl::createAffineFullUnroll();
}
#undef GEN_PASS_DEF_AFFINEFULLUNROLL
#endif // GEN_PASS_DEF_AFFINEFULLUNROLL

Autogenerated .inc file

As can be seen here, TableGen automatically generates getArgument() or getDescription() methods for us. But this convenience comes at a cost. TableGen is not an abstraction layer; it is a code generator. When something goes wrong, errors often originate in generated .inc files. Debugging requires understanding both the declarative specification and the generated C++.

More specifically, TableGen does not explicitly indicate which functions must be implemented for the code to compile. Developers often need to inspect the generated macros and .inc files to understand which pieces of auto-generated code must be included and where they should be injected.

To use MLIR effectively, you must eventually become comfortable reading the generated code. The abstraction only works if you understand what it expands into.

Traits: Optimization Contracts

One of MLIR’s most subtle mechanisms is its trait system.

Adding a trait like Pure may seem trivial, but it fundamentally affects which optimization passes can legally transform your operation. For example, loop-invariant code motion will only move operations that declare themselves free of side effects.

class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic, [Pure]> {
    let arguments = (ins Polynomial: $lhs, Polynomial: $rhs);
    let results = (outs Polynomial: $output);
    let assemblyFormat = "$lhs `,` $rhs attr-dict `:` type($output)";
}

Example of adding Pure trait

class SubOp : public ::mlir::Op<SubOp,
::mlir::OpTrait::ZeroRegions,
::mlir::OpTrait::OneResult,
::mlir::OpTrait::OneTypedResult<::mlir::tutorial::poly::PolynomialType>::Impl,
::mlir::OpTrait::ZeroSuccessors,
::mlir::OpTrait::NOperands<2>::Impl,
::mlir::OpTrait::OpInvariants,
::mlir::ConditionallySpeculatable::Trait,            // <-- new
::mlir::OpTrait::AlwaysSpeculatableImplTrait,   // <-- new
::mlir::MemoryEffectOpInterface::Trait>          // <--- new
{ ... }

Updates on autogenerated .inc file

Optimizations in MLIR are permission-based. Operations must declare their semantic properties explicitly. Without those declarations, the optimizer will conservatively avoid transforming them.

For example, adding the Pure trait allows certain optimization passes to safely operate on an operation. A concrete case is the -cse (Common Subexpression Elimination) pass. If two operations compute the same result with identical operands, -cse may replace the second with the first — but only if the operation has no side effects. By marking an operation as Pure, we explicitly declare that it does not read or write memory and has no observable side effects, making it safe for CSE to eliminate redundant instances.

This design makes optimization safer and more modular. It also means dialect authors must think carefully about semantic contracts.

However, this also reveals one of MLIR’s practical constraints. Because MLIR evolves rapidly, there is no comprehensive or stable documentation clearly specifying which passes depend on which traits or interfaces. In practice, developers often need to inspect pass implementations directly to understand the required semantic contracts and determine why a particular optimization does or does not apply.

Traits are not annotations. They are formal commitments about behavior.

Folding, Canonicalization, and Global Propagation

Folding is operation-local simplification. If an operation’s operands are constant attributes, it can compute its result immediately. Folding happens opportunistically and locally.

Canonicalization is more structural. It rewrites patterns across multiple operations to produce simpler forms.

Sparse conditional constant propagation goes further. It performs global analysis across control flow to deduce constant values. However, it does not eliminate dead code; canonicalization typically follows to clean up.

Understanding how these mechanisms interact is essential when designing optimization pipelines. Folding reduces local redundancy. Canonicalization simplifies structure. Global passes propagate information across control flow.

Each operates at a different scale.

Declarative Rewrite Patterns: When Rewriting Becomes a Language

Earlier we discussed pattern rewriting using C++. But MLIR goes further. It allows you to describe rewrites declaratively.

This is where PDL (Pattern Description Language) comes in.

PDL is not just syntactic sugar. It’s a meta-layer that treats rewrites as data. Instead of embedding rewrite logic in C++ classes, you describe patterns in a declarative form that MLIR can interpret or compile.

Why is this important?

Because rewriting is central to MLIR. Almost every transformation is a pattern-based rewrite.

When rewrites become data:

  • They can be reasoned about.
  • They can be generated.
  • They can be optimized.
  • They can be loaded dynamically.

PDL shifts rewriting from “hardcoded compiler logic” to something closer to a transformation DSL.

This is a powerful idea: the compiler’s transformation logic becomes programmable.

Dialect Conversion: The Formal Lowering Framework

Progressive lowering sounds simple conceptually, but in practice it is extremely delicate.

How do you guarantee that after lowering, no illegal operations remain? How do you ensure partial lowering doesn’t leave dangling constructs?

This is where dialect conversion enters.

Dialect conversion is not just pattern rewriting. It is a constrained rewriting system with legality checks.

You define:

  • Which dialects are legal
  • Which operations are illegal
  • How illegal operations must be rewritten

The conversion framework then ensures that all illegal operations are rewritten into legal ones — or it fails.

This is incredibly important.

Without this structure, lowering becomes ad-hoc and unsafe.

Dialect conversion makes lowering declarative and verifiable.

It is the formal mechanism that turns “we should lower this” into “the IR is guaranteed to be in the target dialect.”

In other words, it transforms lowering from convention into contract.

Lowering Through LLVM: The Final Boundary

Eventually, many MLIR pipelines lower into the LLVM dialect.

This is the boundary between structured, semantic-rich IR and low-level code generation.

What’s interesting is that the LLVM dialect is not just LLVM IR pasted into MLIR. It is a structured representation of LLVM concepts within MLIR’s operation system.

Lowering through LLVM typically involves:

  • Converting structured control flow into CFG-based control flow
  • Converting high-level types into LLVM-compatible types
  • Mapping memory semantics carefully

Once in LLVM dialect, MLIR effectively hands control back to LLVM’s optimization and backend infrastructure.

This layered architecture is elegant:

  • High-level reasoning happens in structured dialects.
  • Low-level instruction scheduling and codegen happen in LLVM.

MLIR does not replace LLVM. It orchestrates the journey toward it.

Why This Matters for ML Systems

Modern ML workloads consist of tensor computations, structured loops, and domain-specific operations. Flattening these into low-level IR too early destroys information that is crucial for optimization.

MLIR allows compilers to reason about:

  • Tensor shapes
  • Loop bounds
  • Affine indexing
  • Memory effects
  • Operator semantics

And to preserve that reasoning until the right moment to lower.

This is why MLIR underpins many modern ML compilers. It enables aggressive optimization without sacrificing abstraction prematurely.

The Deeper Insight

LLVM unified code generation across languages. MLIR unifies compiler construction across domains.

It provides a way to define new IRs, attach semantics to them, transform them safely, and lower them progressively. It turns IR design itself into a modular and extensible discipline.

It is more complex than traditional compiler infrastructure. But that complexity reflects the complexity of modern workloads.

MLIR is not simply another intermediate representation.

It is a framework for thinking about representations.

Updated: