Preselection

Preselection was described briefly in the Overview, and details have been given in the Syntax page and elsewhere. Here we gather the pieces in a single place, and describe the full generality of the various parts.

A PRESELECT rule is an optional component of a syntax (or user) file. It takes the form:


    PRESELECT
        clause1;
        clause2;
        ...
        clausep
    %

the final clause of which is not terminated by a semicolon. Each of the preselection clauses has one of two forms;

with right-hand side:


    tag1: node1;

or without:


    tag2;

The tags are compound attributes. (Recall that this term includes simple attribute as a special case.) The right-hand sides, where they exist, are terminal nodes in which all attachments may appear except for tranformation requests.

The clauses are take effect when a cluster of sentences is generated in the case of local preselections, or when the user executes the global preselection operation in the case of global ones. At this time, lexicon entries are selected for every clause which has a right-hand side. The selection process may make use of indirection, and may refer to lexicon entries selected by earlier clauses. The fact that these entries are selected before creation of the syntax tree is what gives rise to the term preselection.

Clauses which have only a tag are used to affect choices made by the syntax. The preselected lexicon entries can provide information to the syntax, or may become the lexicon entries for terminal/leaf nodes.

Pre-phrases

Pre-phrases provide the means for the syntax to access preselection clauses and their associated lexicon entries. Pre-phrases take one of two forms and can occur in some ten contexts, their action depending on the combination.

The two forms are C _pre_ D and _pre_ D, C and D being compound attribute patterns. In either case, D is searched among the tags, from the bottom to top and from local to global. (Thus a local tag supersedes a global; and a later tag, an earlier one.)

The contexts are:

In the attachment contexts, only the form _pre_ D is relevant, and it causes the lexicon entry preselected for tag D to become the selected entry for the terminal node.

In contexts (b), both forms of pre-phrase are appropriate, and compute one of the values true or false. The second form, _pre_ D, yields true if D is successfully matched to a preselection tag; false, otherwise.

In these contexts, the first form must be preceded by the keyword _is_ : _is_ C _pre_ D. (The keyword _in_ is a synonym for _pre_, and may be preferred in this case.) This asks whether the lexicon entry preselected for tag D contains an attribute matching C. For example:


    _is_ masc _in_ villain

determines whether the entry chosen for villain has masc as an attribute value.

In contexts (c) and (d), both forms are appropriate, and compute an attribute value. The value is added to the attribute list in contexts (c), and becomes the value of the attribute variable in context (d).

The form _pre_ D yields as its value the tag which D matches, and in this case, D may contain deconstruction slashes. So, for example:


    _pre_ Gender/obj

locates a tag having the pattern Gender.obj, and produces its Gender value.

The form C _pre_ D looks for, and produces, a value matching C in the lexicon entry preselected for tag D. For example:


    Gender _pre_ villain

returns the Gender value of the entry selected for the villain. In these contexts, C may contain deconstruction slashes.

Consistency
(This section can safely be omitted on first reading)

In certain circumstances, the form C _pre_ D can give rise to issues of interpretation and consistency. We will not discuss these fully here, but will illustrate the problem with some examples.

Consider two alternative preselection clauses:


    (C1) scélérat: N[humain, fém];   {= villain}
    (C2) scélérat: N[humain];

their preselected lexicon entry:


    (L) "étudiant"|N|Genre, humain, ...|...

and three pre-phrases:


    (P1) _is_ fém _pre_ scélérat   (yielding a Boolean result)
    (P2) _is_ masc _pre_ scélérat  (yielding a Boolean result)
    (P3) Genre _pre_ scélérat      (yielding an attribute)

In this case, the lexicon entry does not determine a Genre value for "étudiant". The question, therefore, is what to produce for the different clause and pre-phrase combinations. In designing VINCI, we have tried to anticipate what a user might expect. Unfortunately, there is sometimes no single answer; it depends on how the user views his/her question or attribute request.

For (C1, P1) and (C1, P3), the answer seems clearcut. The lexicon entry was selected in response to a request for a feminine noun, and the user presumably expects the results: true and fém, respectively.

This suggests that the process must consult the attribute list attached to the preselection clause as well as the lexicon entry itself. (Note that there can be no conflict between the two; the entry would not have been selected unless it matched all the attribute values in the list.) Should it also consult the attribute lists of other clauses which are associated with the same entry?

The case (C1, P2) is not so clear. On the face of it, the result should be false. But this depends on some external knowledge. We are assuming that, while "étudiant" can be masc or fém, any given instance is one or the other, and that these are mutually exclusive. But VINCI does not require this and has no way of knowing whether the user expects it. In fact, in our work on aspectual morphology, verbs have a subset of values of type Aspect and it is this subset which determines the so-called tense. We have also written lexicon entries of the form:


    "Prince Charming"|N|human, handsome, brave, strong, ... | ...

in our fairy tale grammar, where value brave (of type Qualities) does not preclude other Qualities values: strong, handsome, etc.

This problem is even worse with non-simple compound attributes. As noted in the Syntax web page under Attribute Force, the presence of sing.subj in the list does not usually preclude plur.obj.

In the case (C2, P3), neither lexicon entry nor attribute list suggest a choice of Genre value. So VINCI must make a random choice, and give this as the result. But now we are faced with a question of consistency. If the generation process later comes upon another instance of P3 in the syntax, it will surely be expected to produce the same result. The implication is that the chosen attribute must be added to the attribute lists of all clauses which the lexicon entry is associated with. Similar issues arise if we encounter (C2, P1) or (C2, P2) before (C2, P3).

There is no "right" way to resolve these issues. What simplifies one application may complicate another.

The current implementation, which is inconsistent and subject to change at the whim of the authors, computes Boolean values by looking at the lexicon entry without reference to the attribute list. (So P1 and P2 both yield true for L.) It computes attribute values, by first looking at the attribute list; then, if necessary, at the lexicon entry; and finally, by making a random choice and adding it to the attribute lists of all tree nodes which use the entry.

There are ways in which a user can bypass some of our choices. For example, (L) can be replaced by two entries:


        "étudiant"|N|masc, humain, ...|...
        "étudiante"|N|fém, humain, ...|...
    

Attribute types such as Qualities(handsome, brave, strong, ...) can be replaced by several: Bravery(brave, cowardly), Strength(weak, so_so, strong) ..., to make values mutually exclusive.

Repeated use of P3 can be replaced by a SELECT clause, and repeated use of the variable.