Context-Free Grammar (CFG)
Context-Free Grammar (CFG)
1) CFG Basics
- A context-free grammar G = (V, Σ, R, S)
- V: nonterminals (variables)
- Σ: terminals (alphabet, tokens)
- R: production rules of the form A → α where A ∈ V and α ∈ (V ∪ Σ)*
- S: start symbol
-
Language of G: L(G) = { w ∈ Σ* S ⇒* w }
Example
E → E + T | T
T → T * F | F
F → ( E ) | id
- Generates arithmetic expressions with +, *, parentheses, and identifiers.
2) Derivations & Sentential Forms
- Single-step derivation: αAβ ⇒ αγβ if A → γ is a production.
- Leftmost derivation (LMD): always rewrite the leftmost nonterminal first.
- Rightmost derivation (RMD): always rewrite the rightmost nonterminal first.
- Sentential form: any string in (V ∪ Σ)* reachable from S by ⇒*.
- Yield: a sentential form consisting only of terminals (a sentence).
Example (LMD)
S ⇒ E
E ⇒ E + T
E ⇒ T + T
T ⇒ F + T
F ⇒ id + T
T ⇒ F
F ⇒ id
; result: id + id
3) Parse Trees
- A parse tree (derivation tree) visualizes a derivation:
- Root labeled with the start symbol S.
- Each internal node A expands to its children according to a production A → α.
- Leaves read left to right give the derived sentence.
- Parse trees abstract away order of derivation (LMD vs RMD) but reveal structure.
4) Ambiguity
- A grammar is ambiguous if some w ∈ L(G) has two or more distinct parse trees (or two distinct LMD/RMD).
- Classic example:
E → E + E | E * E | id - For
id + id * idthere are two parses:(id + id) * idvsid + (id * id).
- Fix with precedence/associativity encoded in grammar:
E → E + T | T ; '+' lowest precedence, left-assoc T → T * F | F ; '*' higher precedence, left-assoc F → ( E ) | id
5) Nullable, FIRST, FOLLOW
- Nullable(A): A ⇒* ε.
- FIRST(α): set of terminals that can begin strings derived from α (and ε if α ⇒* ε).
- FOLLOW(A): set of terminals that can appear immediately to the right of A in some sentential form (and
$for end-of-input when A can finish the input).
FIRST rules (summary)
- If X is terminal: FIRST(X) = {X}
- If X is nonterminal: add FIRST of its alternatives; include ε if any RHS ⇒* ε
- For concatenation αβ: FIRST(αβ) = FIRST(α) minus ε, plus FIRST(β) if α ⇒* ε
FOLLOW rules (summary)
- Put
$in FOLLOW(S) for start symbol S. - For any A → αBβ: add FIRST(β) \ {ε} to FOLLOW(B).
- If β ⇒* ε: add FOLLOW(A) to FOLLOW(B).
6) LL(1) Conditions (Top-Down Parsing)
A grammar is LL(1) if for every nonterminal A with alternatives A → α | β:
- Disjointness: FIRST(α) and FIRST(β) are disjoint.
- ε-case: If ε ∈ FIRST(α), then FIRST(β) must be disjoint from FOLLOW(A) (and vice versa).
Left Factoring
- If A → aβ1 | aβ2, factor common prefix:
A → aA' A' → β1 | β2
Eliminate Immediate Left Recursion
- If A → Aα | β, transform:
A → βA' A' → αA' | ε
7) LR Parsing Glance (Bottom-Up)
- Recognizes handles using items and states (LR(0)/SLR/LALR/LR(1)).
- Shift: push next input symbol.
- Reduce: apply A → α when top-of-stack matches α.
- Goto: transition on nonterminal after reduction.
8) Closure Properties (High-level)
- CFLs (context-free languages) are closed under: union, concatenation, Kleene star, homomorphism, reversal.
- Not closed under: intersection, complement (but are closed with regular languages, e.g., CFL ∩ REG is CFL).
9) Leftmost vs Rightmost Example
Grammar:
S → aS | Sb | ab
- String:
aab- LMD: S ⇒ aS ⇒ aaS ⇒ aab
- RMD: S ⇒ Sb ⇒ Sab ⇒ aab
10) Sample Grammar Exercises
- Balanced parentheses:
S → SS | (S) | ε - Binary numbers without leading zeros (except zero itself):
N → 0 | 1B B → 0B | 1B | ε - Palindromes over {a, b}:
P → aPa | bPb | a | b | ε
Quick Glossary
- Terminal: token from the input alphabet.
- Nonterminal: variable to be expanded.
- Production: rewrite rule.
- ε (epsilon): empty string.
- Handle: substring that matches the RHS of a production and whose reduction is a step in reverse of a rightmost derivation.