Normalization

Strong vs weak normalization. Normal-order always finds it if it exists.

8 topics • ~940 words

Confluence told us that if a term has a normal form, all paths that terminate agree on what it is. But confluence says nothing about whether those paths actually terminate. A term might have one path to a normal form and another that loops forever. Normalization is the question of termination: does reduction always finish, or can it get stuck in an infinite loop?

Strong Normalization

A term is strongly normalizing if every reduction sequence starting from it eventually reaches a normal form. There is no path that diverges -- no matter which redex you choose at each step, you will always terminate.

Strong normalization is the strongest termination guarantee. It means reduction strategy does not matter: any order of reductions will finish. You cannot accidentally choose a path that loops forever.

Not all lambda terms are strongly normalizing. But when a term is, you have complete freedom in how you reduce it.

Weak Normalization

A term is weakly normalizing if at least one reduction sequence reaches a normal form. There might be other sequences that diverge, but the term is not hopeless -- some path terminates.

Weak normalization is a weaker guarantee than strong normalization. A weakly normalizing term has a normal form, but you might need to be careful about which redexes you choose. Pick the wrong path and you could loop forever, even though a normal form exists.

The classic example: (λx.a) Ω is weakly normalizing. Normal order reduces the outer redex to get a. But applicative order tries to reduce Ω first and diverges. The term has a normal form, but not every path reaches it.

Untyped Lambda Calculus Is Not Strongly Normalizing

Is the untyped lambda calculus as a whole strongly normalizing? No.

Strong normalization would mean that every term, under every reduction sequence, reaches a normal form. But we already know a counterexample: Ω = (λx.x x)(λx.x x).

Ω reduces to itself at every step. It has no normal form at all -- not under any strategy. One divergent term is enough to disprove strong normalization for the entire system.

This is not a defect. The ability to express non-terminating computations is what makes the untyped lambda calculus Turing complete. A system where every computation must terminate cannot express all computable functions.

Normal Order Always Finds It

The standardization theorem gives normal order a unique status among reduction strategies:

If a term has a normal form, normal order will find it.

Normal order reduces the leftmost outermost redex first. This outside-in approach means functions are applied before their arguments are evaluated. If a function discards its argument, normal order never wastes time reducing that argument -- even if that argument would have diverged.

This is why normal order is called complete with respect to normalization. Other strategies may diverge on weakly normalizing terms, but normal order never misses a normal form that exists.

The guarantee does not extend to terms with no normal form. Ω diverges under every strategy, including normal order. The theorem says: if a normal form is there, normal order will reach it.

Applicative Order Can Miss Normal Forms

Applicative order reduces the innermost redex first -- it fully evaluates arguments before passing them to functions. This can be a problem when an argument diverges but the function would have discarded it.

Consider (λx.a) Ω, where Ω = (λx.x x)(λx.x x).

The function λx.a ignores its argument entirely -- the body a does not mention x. The normal form is simply a.

Normal order reduces the outer redex first: (λx.a) Ω →β a Done in one step. The argument Ω is discarded unreduced.

Applicative order tries to reduce the argument first: (λx.a) Ω → (λx.a) Ω → ... It attempts to reduce Ω, which reduces to itself. Applicative order loops forever and never applies the function.

The normal form a exists. Normal order finds it. Applicative order misses it.

Which Strategy Finds It?

Different reduction strategies handle weakly normalizing terms differently. Some strategies may diverge on a term even when a normal form exists, because they insist on reducing subterms that the function would discard.

One strategy has a special guarantee, backed by the standardization theorem: if a term has a normal form, this strategy will always find it. It achieves this by reducing outermost redexes first, applying functions before evaluating arguments.

Types Guarantee Termination

The untyped lambda calculus is not strongly normalizing because terms like Ω can diverge. But what if we restrict which terms are allowed?

The simply-typed lambda calculus adds type annotations to variables and requires that every term be well-typed. A key consequence: Ω = (λx.x x)(λx.x x) is not well-typed. The self-application x x requires x to have a type that is simultaneously a function and its own argument -- no finite type can satisfy this.

With self-application ruled out, the door to divergence closes. The simply-typed lambda calculus is strongly normalizing: every well-typed term, under every reduction sequence, reaches a normal form. Types act as a static guarantee of termination.

The Termination Tradeoff

Strong normalization sounds like an unqualified good: every computation terminates, no infinite loops, no divergence. Why not always use a strongly normalizing system?

Because termination comes at a cost. The simply-typed lambda calculus is strongly normalizing, but it is not Turing complete. There are computable functions it cannot express.

Turing completeness requires the ability to express computations that might not terminate. A system that guarantees all computations halt cannot compute everything a Turing machine can -- it is strictly less powerful.

The untyped lambda calculus makes the opposite choice: it allows non-termination and gains Turing completeness in return. Ω is the price of being able to compute anything computable.

This is a fundamental tradeoff in the theory of computation, not an engineering decision. No system can be both strongly normalizing and Turing complete.

Ready to test your understanding?

Bitwit uses spaced repetition to help you truly master concepts like this—not just read about them. Each card generates with different values, so you can't just memorize answers.

Practice Normalization →