next up previous
Next: Linear Indexed Grammars Up: On NLP Formalisms: from Previous: Introduction

Indexed Grammars - An Extension of Context-Free Grammars

 

Aho states the motivation of formalizing IGs in his original publication [1] that part of this interest in larger classes of language (i.e., larger than context-free languages) stems from the inadequacy of context-free grammars in specifying all of the syntactic structures found in many modern-day algorithmic programming languages, such as ALGOL. The class of indexed languages assumes a natural position in the Chomsky hierarchy of languages, which has been showed to includes all context-free languages and some context-sensitive languages, and can be generated using a nested stack automaton. We give the definition of an indexed grammar from Aho [1] below.


defn65

Many other definitions such as the reflexive and transitive closures are identical to the ones defined in context-free grammars. Following the IG definition, it is not hard to see all context-free languages can be generated by IGs, and thus belongs to indexed languages. Clearly, if the F set is empty, then an IG is a context-free grammar. Aho [1] proved that indexed languages are more closely resemble context-free languages than they are to context-sensitive languages under the full abstract family of languages (full AFL), where indexed and context-free languages are full AFLs, and context-sensitive languages are not. It is well known that a full AFL is a class of languages closed under the operations of union, concatenation, Kleene closure, homomorphism, inverse homomorphism, and intersection with regular sets. It is also showed [1] that indexed languages are closure under word reversal and substitution. We do not get into full details of the proofs here. Instead, an examples is given below from [1] to get some taste of IGs comparing with context-free and context-sensitive grammars.

Example 1. tex2html_wrap_inline850. The language is well-known not context-free. However, the following IG generates the language. Let tex2html_wrap_inline852, where P consists of the productions tex2html_wrap_inline856 and tex2html_wrap_inline858, and tex2html_wrap_inline860.

Applying the first production once, the second production n - 1 times, tex2html_wrap_inline864, and then the third production once, we have:

tex2html_wrap_inline866.

Expanding B by consuming indices, we have:

tex2html_wrap_inline870.

It is easy to show that the only strings in terminals which can be derived from S in tex2html_wrap_inline874 are all of the form tex2html_wrap_inline876. Thus, tex2html_wrap_inline878. The language can be easily generated with a context-sensitive grammar too. A formal proof of the containment within context-sensitive languages for indexed languages is provided in [1].


next up previous
Next: Linear Indexed Grammars Up: On NLP Formalisms: from Previous: Introduction

Henry Xiao
Fri Apr 29 18:50:42 EDT 2005