Files and Installation

Language Description Files

Seven kinds of files may be used to describe an object language to VINCI. They are:

The first three are mandatory for sentence generation; the rest need be specified only if the language description uses the features.

We have already encountered most of these file types in their respective Manual sections. The terminals file is simply a list of word categories appearing in the lexicon, for example:


    N
    V
    ADJ
    PREP
    ...

These will normally be the metavariables of terminal nodes, though we have mentioned several circumstances in which a node with a different metavariable may become terminal, leading to an error during lexical search.

Semantic transformations will be discussed in a future section of the Manual.

Lexical transformations are defined in:

described in the corresponding section.

Finally, three files may be used in connection with the student-error checking process:

These too will be described in the appropriate section.

For VINCI to make use of these files, they must be installed, using the commands provided in ivi/VINCI for this purpose.

Before dealing with these, however, we shall mention some features common to all the files.

Common Features

The file installation commands share a subprocess which provides common features to all the previously mentioned files.

Some of the following have been mentioned in earlier sections:

The sequences %include, %define and %macroname for any macroname behave as reserved identifiers. They must be separated from other identifiers by spaces or other non-identifier characters. So pqr%include "filename", for example, is not an include command.

The macrocall %macroname could actually be terminated by any non-identifier character, but the current implementation causes the terminating character to appear in the expanded text before the abbreviated sequence. Use of the space is therefore safest, as it will be discarded anyway.

Installation Commands

The following commands read a file containing some component of a language description and set up the corresponding VINCI data structures. Each requires a <filename> as parameter. In most cases, installing a file discards (uninstalls) any previously installed file of the same kind. Some fuller comments follow the table.

ATtribute Install attributes
TMnls Install terminals
LExicon Install lexicon
VAddlex Add to lexicon
SYntax Install "main" syntax
USer Install "user" syntax
MOrphologyInstall morphology rules
TBls Install morphology tables
SMtransf Install semantic transformations
LXtransf Install lexical transformations
MRpherror Install morphological variants
IPa Install phonological variants
TGinfo Install lexical variants

As usual in ivi, only the first two letters of the command are typed; ivi supplies the rest. This is followed by the name of the file; and <RETURN> triggers the installation.

The use of the first two letters only is, in fact, the reason for the unfriendly names. At last count, ivi included 75 commands, using up many of the pronounceable initial pairs!

Terminology. For the sake of abbreviation, we often refer to files by the first two letters of the command which installs them: TM files, AT files, and so on.

The LExicon and VAddlex commands read a lexicon either in textual form or as a set of ivi records (or indeed as a mixture). The LExicon command discards an existing lexicon. The VAddlex command does not, simply adding new lexical entries to the ones already present. This may well be used after new words have been generated with lexical transformations.

The internal data structure used by the installed lexicon relies on the attribute and terminal data, and changing these files invalidates the structure. For this reason, the AT and TM files must be installed before the lexicon, and these commands uninstall any lexicon installed previously. These are just two of the dependencies between files; the complete set is given below.

The situation with SYntax and USer is more complicated. As we noted in the Syntax section, VINCI provides for two layers of syntax files. One, the SY file, usually contains a library of rules which will be used for many different sets of generated sentences; the other, the US file, contains rules which specify the particular sentences to be generated on this occasion. (So typically a ROOT rule will be in a US file.) There is no difference in form between the two varieties, and either file may be installed by either command. The difference is found in the action of the commands. The SY command uninstalls all previous syntax, and installs its file. The US command uninstalls only the rules last installed by a US command, adding its rules to any which remain. It is the combination of the two sets which form the current syntax. The combination may consist of one, or the other, or both. (In other words, there may be only an SY file, only a US file, or both.)

During generation, the rules are scanned from the last up. So a rule from the US file supersedes one from the SY file (or indeed, one of the same name higher up the US file), and so on.

A caution in advance for the Preselections section.

Preselection has been described in the Overview, and mention made of global and local preselections. It is important to realize that the terms global and local in that context refer to the time at which the preselections are made, not to the file in which the PRESELECT rule occurs. Global preselections are made once for many sentences; local ones are made over and over again for each individual sentence. It is very likely that the global PRESELECT rule will occur by itself in a US file, because if it hangs around and is not superseded by a local PRESELECT rule, it will be carried out again as if it were the local one, thus overriding the global selections. This, by the way, is why we have refrained from using the terms global and local for the two levels of syntax.

Warnings and errors detected by the installation commands are reported in ivi's corefile 7, which also keeps a log of the installations. It should be appreciated that errors in some files may have serious repercussions for later ones. For example, errors in defining attributes may cause many lexicon entries to be rejected. Persons writing language descriptions are therefore advised to monitor corefile 7 during input.

During lexicon installation, VINCI displays a "rolling" progress message on the second-to-last line of the screen, indicating the number of entries so far processed. This is one of a number of transient messages, called progress messages, which VINCI writes during installation, sentence generation, word creation, and so on. In VINCI's early days, when some of these activities took minutes, these messages confirmed to the user that the operations were still progressing. Today, the user may remain blissfully unaware that most of them even exist, since the time between their posting and their removal may be very short, and if this takes place between successive screen refreshes, the message will not even be displayed.

The rolling display during lexicon input has been removed.

Uninstallation

Installed files may be uninstalled by the command RM_vfiles. This takes a parameter consisting of one or more of the command letter- pairs; for example:

    RM LE, MO, TB

uninstalls the lexicon, and the morphology rules and tables files. Any of the command letter-pairs may appear except VA. (VINCI does not differentiate between lexicon entries installed by LE and VA.) In addition, the letter-pair GP is used to discard global preselections, and ALL to uninstall all entities.

We have already noted that some of the data structures resulting from installation relies on data previously installed, so that uninstallation (or new installation) of the latter requires uninstallation of the former. The dependencies (subject to review) are:

parameter also causes
removal of
AT LE, MR, GP
SY US
TM LE, MR
LE GP

GP also discards local preselections as well as the most recently generated set of utterances. (The error-checking process, which might be called to operate on the current sentences, may fail if lexicon or preselections are no longer present.)

The files, including GP, actually discarded is reported in corefile 7.

Useful Hints

In order to keep track of the files which describe the sample languages we use for testing, research and teaching, we have found it convenient to give each language a short name:


    french, fairytale, ...

and to name each of the files (except for US files) by this name plus a suffix. The suffix consists of two first letters of the installation command:


    french.at, french.tm, french.le, ...

There are, of course, many user files for each language, and some other meaningful names must be assigned to them, usually with the suffix .us or .u.

To simplify installation we write an ivi procedure with the name of the language as file name. The procedure executes the installation commands. So the file french contains:


    AT french.at
    TM french.tm
    LE french.le
    ...

Installation is then carried out by the ivi command:


    PR french

reducing work for the user, while ensuring that no files are missed, and that they are in the correct order. (Be sure, though, that the first command is on the first line of the procedure file; otherwise the line break on the first line will switch ivi to Typing Mode.)