Limitations of Context-free Grammars

We have seen that the tiny English grammar of Note 37(b) generates not only the grammatically correct sentence:

THE RED CAT EATS THE BRAN

but also the incorrect:

THE RED CAT EAT THE BRAN

These illustrate an important limitation of context-free grammars. Once a node on a derivation tree has been developed into subnodes, no further communication between these subnodes is possible. Subsequent rewriting rules are applied to a subnode without consideration of its context; so, nothing which happens down one subtree can affect what happens down another. In the above example, once [sentence] has developed into [nphrase][verb][nphrase], nothing in the cfg mechanism can prevent the subject [nphrase] from developing into a singular noun phrase and [verb] into a plural verb.

A similar situation arises in programming languages. The syntax should ensure that only declared variables appear in subsequent statements; and that variables declared as a particular type appear only in contexts appropriate to that type. But nothing in the cfg mechanism can prevent the generation of:

Boolean x;
x = 16;

Parsing algorithms must therefore suspend context-free analysis whenever a variable is detected to ensure that the variable is valid within its context.

A second limitation arises from the fact that, once a rule A -> BCD has been applied during a cfg derivation, all symbols in the final string derived from B must precede all those in the final string derived from C, which in their turn precede those in the string derived from D.

If we consider the sentences:

THE RED CAT WILL EAT THE BRAN
WILL THE RED CAT EAT THE BRAN?
THE BRAN WILL BE EATEN BY THE RED CAT

the second and third, though surely related to the first, cannot be obtained from it by the application of additional cf rules.

We can, of course, write separate sections of the grammar to yield interrogative and passive sentences, just as we can separate singular and plural phrases in the earlier example, but this is unsatisfying as a means of describing linguistic processes. Furthermore, it would not be feasible to apply the same technique to variable names in a programming language.

A device used in some linguistic theories is to add "transformational rules", which allow the first sentence to be transformed into the second and third. The first sentence is then referred to as indicating the "deep structure" of the second and third, while the latter represent the "surface structure". We will not pursue this direction further here.

Beyond Context-free

Context-sensitive grammars

Example 1: Consider the context-sensitive grammar:

VN: {S, A, B}
VT: {a, b, c}
Initial Symbol: S
P: {S-> Abc, S -> ABSc, BA -> AB, Bb -> bb, A -> a}

and attempt to derive some of the strings of its language.

You will quickly discover that a type 1 grammar differs from a cfg in two significant ways: you can get "blocked" derivations, i.e. you can derive strings containing non-terminals to which no rules can be applied, and the order in which the rules are applied matters. Indeed, to prove that a type 1 grammar yields ONLY the strings you want, you usually have to show that applying the rules in an unintended order leads only to blocked strings.

This grammar derives all strings of the form {aⁿbⁿcⁿ | n >= 1}.

To see this, label the rules (i) to (v).

Begin by applying rule (ii) n - 1 times, followed by rule (i) once. This derives ABAB ... Abcc ... c in which there are n A's, n B's and b's, and n c's.

Rule (iii) allows B to move rightwards past A, so that repeated applications permit the B's to move out from among the A's. Rule (iv) allows B to change into b, but only on contact with an existing b. Now the initial b is to the right of all A's. So no B becomes b until it has crossed over all A's to its right. Rule (v) simply permits A to become a.

Note that application of these rules in unexpected order does not yield anything extra. For example, if we apply rule (iii) before rule (i), some movement of B's takes place earlier than intended, but no B can ever get across the S, so that (i) must be applied before (iv) can be used. If (v) is applied too early, some B's may be locked permanently to the left of a's. etc. [In fact, I no longer remember why I included A and rule (v); we might just as well have employed a and Ba -> aB to start with.]

Example 2: If VT is some terminal alphabet with at least two symbols, the language {ww | w is in VT*} is not context-free. We can, however, write a context-sensitive grammar.

The trick is to create "catcat" by first generating "cCaAtT" and then shifting the non-terminals to the right. Actually we will do this in two stages. As a first attempt, we will include a special character, #, which disappears at the end; but "# -> empty" is not a context-sensitive rule, so a further adaptation will avoid this.

A type 0 grammar which generates {ww | w in VT*} is as follows:

Let VT be {a, b, c, ...} and VN be {A, B, C, ..., S, #} with S as the initial symbol, where VT contains n symbols and VN, n + 2. Let the rules be:

(i) all rules of the form S -> aAS, S -> bBS, ... (n rules)
(ii) all rules of the form S -> aA#, S -> bB#, ... (n rules)
(iii) all rules of the form Aa -> aA, Ab -> bA, Ba -> aB, Bb -> bB, ... (n-squared rules)
(iv) all rules of the form A# -> #a, B# -> #b, ... (n rules)
(v) # -> empty

Rules (i) and (ii) can be used to create any possible string of the form aAbBbBcCaAdD#. Rules (iii) allow the upper-case (uc) letters to move right across the lower-case (lc); but uc cannot move across uc, nor lc across lc. Rules (iv) allow # to move left across uc, transforming them to lc. Since # cannot cross lc, it cannot transform any uc unless the uc has crossed all lc to its right. Finally, rule (v) allows # to vanish. Note that if rule (v) is applied before all the "unscrambling" and the transformation of uc to lc, then no terminal string can result.

As we have noted, (v) is not a type 1 rule, and there can be no way to eliminate # in this grammar using type 1 rules, since the form derived by (i) and (ii) is longer than the desired terminal string. To overcome this, we need the "end-marker" itself to become a symbol of the terminal string.

Introduce, then, n different #-symbols, one for each member of VT. Replace (ii) and (v) by

(ii)' S -> #_aa, S -> #_bb, ... (n rules)
(v)' #_a -> a, #_b -> b, ... (n rules)

and (iv) by n-squared rules allowing any #-symbol to cross left over any uc and transform it to its corresponding lc. Of course, only one #-symbol appears in any derivation, and it is destined to become the rightmost terminal in the left half of the string. Otherwise, the behaviour is the same as that of the type 0 grammar. As before, if (v) is applied too early, a non-terminal will be trapped permanently in the string.

This grammar, incidentally, provides a clue about constructing a context-sensitive grammar in which variables are only used in proper context. If you want a csg to generate some set of strings of the form x1 w x2 w x3, where x1, x2, x3 and w are terminal strings, and the same w occurs twice over, we first generate a string of the form:

x1 aAbBcCdD x2 # x3

where in this instance we want w to be abcd; then we have A, B, C, D move right until they encounter #, etc. It is difficult, however, to imagine a programming language grammar which requires all copies of an identifier to be created in this manner!

Example 3: Consider again the sentences:

THE RED CAT WILL EAT THE BRAN
WILL THE RED CAT EAT THE BRAN

Let N denote a noun-phrase, V a verb and M a modal verb ("will"). Then we may derive the first using a cf rule S -> NMVN, and the second by way of a cs rule NM -> MN.

Replace the latter rule by three rules in Chomsky's original definition:

NM -> XM (i.e. N -> X in the right-context of M)
XM -> XN (i.e. M -> N in the left-context of X)
XN -> MN (i.e. X -> M in the right-context of N)

Now, recall that with rules in this form, we may draw a derivation tree for a context-sensitive derivation:

                                    S
                                  / | \  etc.
                                 /  |  \
                                N   M   V
                               /    |    \
                              /     |     \
                             X      N     eat
                             |      |
                             |      |
                             M     ...
                             |    /   \
                           will  the cat

This derivation certainly produces the second sentence, but a glance at the tree shows some oddities. "will" is seen as a modal verb, but higher up the tree, it is also shown as a noun-phrase! Similarly, "the cat" is seen as a noun-phrase, but also as a modal verb! The tree, then, is not very useful.

Beyond Chomsky

Examples 1 and 2 above illustrate the fact that the behaviour of a csg is far more difficult to grasp than that of a cfg.

We therefore explore two of the several ways which have been suggested for exceeding the power of context-free rules while retaining their clarity. One divides the cf rules into subsets and restricts the moments at which each subset can be used. The other provides for infinitely many cf rules, individual rules being created using rule-schemata.

Time-varying grammars

In a "periodic time-varying context-free grammar, with period k" the rewriting rules are arranged in sets P(1), P(2), ..., P(k), and at step t in a derivation, a rule must be chosen from set P(r) where r = t mod k. [In other words, the sets are used cyclically.] Consider the ptvcf-3 grammar:

VN: {S,X,Y,Z,U}
VT: {a,b,c}
Init: S
P: P(1) v P(2) v P(3)

where

P(1) = {S -> XYZ, Z-> cZ, Z -> c}
P(2) = {X -> aX, X -> a, U -> b}
P(3) = {Y -> bY, Y -> U}

Derive some strings, and observe that it generates the language {aⁿbⁿcⁿ | n >= 1}. Think about the role of U.

The intended order of application is:

S -> XYZ,
(n - 1) times (X -> aX, Y -> bY, Z -> cZ),
X -> a, Y -> U, Z -> c, U -> b

and this yields the nth string in the language.

If you depart from this precise order (for example, if you apply Z -> c before the other two; or, having applied X -> a, you continue with Y -> bY), you reach a dead end: a string containing non-terminals to which no rule for that moment applies.

There are many ways to see this. Perhaps the easiest is to note that, after the first moment, the string contains one each of X, Y and Z, and that it requires a total of four steps to eliminate them. Now, once you eliminate either Y or Z, you can never pass through P(3) or P(1) again, so neither of these can be the first to go. Thus the first must be X. But if you eliminate X, you cannot pass P(2) again unless U is present. Hence, Y -> U must follow immediately. And since we cannot now pass through P(3) again, we must eliminate Z straight away. And that leaves U -> b as the only applicable production.

Suppose you remove U from the grammar and include Y -> b in P(3). Then the last two paragraphs no longer hold true. The "final loop" can begin Y -> b or Z ->c, and you can derive, for example: (read => as "derives")

S => XYZ => aXYZ => aXbZ => aXbc => aabc

S => XYZ => aXYZ => aXbYZ => aXbYc => aabYc => aabbc

In effect, the "wasted step" provided by U forces the final stage to begin in P(2), preventing early departure from the "main loop".

Two-level (or W-) grammars

In the following, we resort once again to [ ] in place of angle-brackets to denote non-terminals, because angle-brackets are interpreted by HTML.

Let G be a grammar having infinitely many rewriting rules and infinitely many non-terminals as follows:

(i) a, b, c are terminals,

(ii) [S], [XR] are non-terminals for all X and R defined below,

(iii) [S] -> [aR] [bR] [cR]
[XrR] -> [X] [XR]
[X] -> X

are rule schemata (templates) which yield rewriting rules for all choices of X and R defined below, PROVIDED THAT each particular choice of X and R is substituted consistently for that letter throughout a schema,

(iv) [S] is the initial symbol,

(v) X and R are defined by the following grammar in which {a,b,c,r} are terminals, and l is the empty string:

R -> rR, R -> l, X -> a, X -> b, X -> c

Write down some typical non-terminals and some typical rewriting rules of G. What language does G generate?

X is any of a, b, c; R is any of rⁿ (n >= 0).

The non-terminals of the language are therefore:

[S], [a], [b], [c], [ar], [br], [cr], [arr], [brr], [crr], ...

The rules are:

[S] -> [arⁿ] [brⁿ] [crⁿ]     for each n >= 0        [t1]
[arⁿ⁺¹] -> [a] [arⁿ]           for each n >= 0        [t2]
[brⁿ⁺¹] -> [b] [brⁿ]           for each n >= 0
[crⁿ⁺¹] -> [c] [crⁿ]           for each n >= 0
[a] -> a, [b] -> b, [c] -> c                             [t3]

A typical derivation takes the form:

[S] => [arr] [brr] [crr]           [t1, with n = 2]
    => ... => [a][ar] [b][br] [c][cr]        [t2, with n = 1]
    => ... => [a][a][a] [b][b][b] [c][c][c]        [t2, with n = 0]
    => ... => aaabbbccc           [t3]

So we have once again our favorite csl {aⁿbⁿcⁿ | n >= 1} [proving that a cfg with an infinity of rules can generate a language which is not a cfl].

Note that X and R play rather different roles here. X merely permits a finite set of closely similar rules to be summarized by a smaller number of templates, without affecting the context-free nature of the language. In this example, we could have replaced [t2] by three templates and [t3] by three rules, omitting X altogether.

And yet it has a tidying effect. For example, if X is s or p, the syntax rules for noun-verb agreement in tiny English example:

 [sentence]   ::= [pnphrase] [pverb] [anynphrase] | [snphrase] [sverb] [anynphrase]
 [anynphrase] ::= [pnphrase] | [snphrase]
 [snphrase]   ::= THE [snqual]
 [pnphrase]   ::= THE [pnqual]
 [snqual]     ::= [snoun] | ...
 [pnqual]     ::= [pnoun] | ...

might be reduced to

 [sentence]   ::= [Xnphrase] [Xverb] [anynphrase]
 [anynphrase] ::= [Xnphrase]
 [Xnphrase]   ::= THE [Xnqual]
 [Xnqual]     ::= [Xnoun] | ...

And there are similar instances in the definition of Algol 60 where large blocks of rules could be summarized in this way.

R, on the other hand, yields any infinite set of non-terminals and rules. It is R which is responsible for passing beyond the cfl limitations.

A mechanism like this was used in the specification of the language Algol 68.