Context-Free Grammar (CFG)

1) CFG Basics

  • A context-free grammar G = (V, Σ, R, S)
    • V: nonterminals (variables)
    • Σ: terminals (alphabet, tokens)
    • R: production rules of the form A → α where A ∈ V and α ∈ (V ∪ Σ)*
    • S: start symbol
  • Language of G: L(G) = { w ∈ Σ* S ⇒* w }

Example

E → E + T | T
T → T * F | F
F → ( E ) | id
  • Generates arithmetic expressions with +, *, parentheses, and identifiers.

2) Derivations & Sentential Forms

  • Single-step derivation: αAβ ⇒ αγβ if A → γ is a production.
  • Leftmost derivation (LMD): always rewrite the leftmost nonterminal first.
  • Rightmost derivation (RMD): always rewrite the rightmost nonterminal first.
  • Sentential form: any string in (V ∪ Σ)* reachable from S by ⇒*.
  • Yield: a sentential form consisting only of terminals (a sentence).

Example (LMD)

S ⇒ E
E ⇒ E + T
E ⇒ T + T
T ⇒ F + T
F ⇒ id + T
T ⇒ F
F ⇒ id
; result: id + id

3) Parse Trees

  • A parse tree (derivation tree) visualizes a derivation:
    • Root labeled with the start symbol S.
    • Each internal node A expands to its children according to a production A → α.
    • Leaves read left to right give the derived sentence.
  • Parse trees abstract away order of derivation (LMD vs RMD) but reveal structure.

4) Ambiguity

  • A grammar is ambiguous if some w ∈ L(G) has two or more distinct parse trees (or two distinct LMD/RMD).
  • Classic example:
    E → E + E | E * E | id
    
  • For id + id * id there are two parses:
    • (id + id) * id vs id + (id * id).
  • Fix with precedence/associativity encoded in grammar:
    E → E + T | T          ; '+' lowest precedence, left-assoc
    T → T * F | F          ; '*' higher precedence, left-assoc
    F → ( E ) | id
    

5) Nullable, FIRST, FOLLOW

  • Nullable(A): A ⇒* ε.
  • FIRST(α): set of terminals that can begin strings derived from α (and ε if α ⇒* ε).
  • FOLLOW(A): set of terminals that can appear immediately to the right of A in some sentential form (and $ for end-of-input when A can finish the input).

FIRST rules (summary)

  • If X is terminal: FIRST(X) = {X}
  • If X is nonterminal: add FIRST of its alternatives; include ε if any RHS ⇒* ε
  • For concatenation αβ: FIRST(αβ) = FIRST(α) minus ε, plus FIRST(β) if α ⇒* ε

FOLLOW rules (summary)

  • Put $ in FOLLOW(S) for start symbol S.
  • For any A → αBβ: add FIRST(β) \ {ε} to FOLLOW(B).
  • If β ⇒* ε: add FOLLOW(A) to FOLLOW(B).

6) LL(1) Conditions (Top-Down Parsing)

A grammar is LL(1) if for every nonterminal A with alternatives A → α | β:

  • Disjointness: FIRST(α) and FIRST(β) are disjoint.
  • ε-case: If ε ∈ FIRST(α), then FIRST(β) must be disjoint from FOLLOW(A) (and vice versa).

Left Factoring

  • If A → aβ1 | aβ2, factor common prefix:
    A → aA'
    A' → β1 | β2
    

Eliminate Immediate Left Recursion

  • If A → Aα | β, transform:
    A  → βA'
    A' → αA' | ε
    

7) LR Parsing Glance (Bottom-Up)

  • Recognizes handles using items and states (LR(0)/SLR/LALR/LR(1)).
  • Shift: push next input symbol.
  • Reduce: apply A → α when top-of-stack matches α.
  • Goto: transition on nonterminal after reduction.

8) Closure Properties (High-level)

  • CFLs (context-free languages) are closed under: union, concatenation, Kleene star, homomorphism, reversal.
  • Not closed under: intersection, complement (but are closed with regular languages, e.g., CFL ∩ REG is CFL).

9) Leftmost vs Rightmost Example

Grammar:

S → aS | Sb | ab
  • String: aab
    • LMD: S ⇒ aS ⇒ aaS ⇒ aab
    • RMD: S ⇒ Sb ⇒ Sab ⇒ aab

10) Sample Grammar Exercises

  1. Balanced parentheses:
    S → SS | (S) | ε
    
  2. Binary numbers without leading zeros (except zero itself):
    N → 0 | 1B
    B → 0B | 1B | ε
    
  3. Palindromes over {a, b}:
    P → aPa | bPb | a | b | ε
    

Quick Glossary

  • Terminal: token from the input alphabet.
  • Nonterminal: variable to be expanded.
  • Production: rewrite rule.
  • ε (epsilon): empty string.
  • Handle: substring that matches the RHS of a production and whose reduction is a step in reverse of a rightmost derivation.

Updated: