A Gentle Introduction to VINCI

This introduction contains the following sections:

Overview

VINCI is a Natural Language Generation Environment which is embedded in an editing environment (ivi) which itself runs in a terminal window.

In what follows, we will step through some basic examples which will show you how to use ivi/VINCI to call up and modify elements of a language specification, use them for generation, and save and manipulate the generated output.

After you have finished with this introduction, you should be ready to read more detailed documentation and create and test grammars on your own.

A note on typographical conventions

Before we begin, please note the following conventions:

Operating systems and terminal windows

ivi/VINCI runs on any of three operating systems:

In what follows, we will provide you with a few basic instructions for working in these environments, but you should be aware that Unix (and Linux and Cygwin) are extremely powerful and complex environments in which an enormous range of operations are available. Many tutorials are available to help you become familiar with them. For help, Google unix tutorial, or linux tutorial, or cygwin tutorial and follow the links provided.

All of these environments have in common that they provide access to a terminal window in which commands may be entered and programs run. A typical terminal window looks like this:

In some of these environments, a terminal window will be open by default, but in others you will need to start one up using the commands appropriate to the operating system.

Creating a working directory

Once a terminal window is open, the first step is to create a working directory. In what follows, we will assume it's called MyVinci, but you are free to choose whatever name you wish.

To do this, first, go to your home directory by typing in the terminal window:

cd <Return>
You should see a prompt which looks something like this:

/home/youruserid/:~$
Next, create the MyVinci directory by typing:

mkdir MyVinci <Return>
Move to the newly created MyVinci directory by typing:

cd MyVinci <Return>
To check that you're in the correct directory type:

pwd <Return>
You should see something like this:

/home/youruserid/MyVinci

You are now ready to download and install the ivi/VINCI software.

Installing ivi/VINCI

Disclaimer

The ivi/VINCI software is a research tool, not a commercial product. We offer it for downloading free of charge in a spirit of sharing and of making the tool available to others who think it might be useful. It must be understood, however, that especially in regard to the VINCI component, we have never had the time or resources to test it thoroughly enough. Though it serves us well, we sometimes encounter errors or even crashes, particularly if the input data departs far enough from what the program expects. We make no claims about its quality and give no warranties. Please accept it for what it is, and use it at your own risk.

We would like to hear about interesting applications, along with suggestions for additions or generalizations to fill gaps that you encounter, but we cannot undertake to respond to questions or to reports of errors.

From your web browser, download the most recent version of the ivi/VINCI program for your particular platform by clicking on the appropriate link below:

When you perform the download, make sure the downloaded file is placed in the MyVinci directory using the appropriate commands for your browser. (Often right-clicking on the mouse button will give you a choice of directories into which a file is to be downloaded.)

From within the terminal window, you can test whether the file has been placed in the MyVinci directory by typing, from within the directory, the following command:

ls <Return>
Depending on the version you have downloaded, you should see something like this:

ivi_solaris.tar
or this:

ivi_linux.tar
or this:

ivi_cygwin.tar

If you don't see one of these, there has been a problem with the download: either the file was not copied, or it was not placed in the proper directory. Check these problems and if necessary, ask a more knowledgeable user for assistance.

The program file has been downloaded in a packed format designed to ensure that it arrives safely. To use it, you must begin by unpacking it.

From within your MyVinci directory, unpack the ivi/VINCI program by typing the appropriate command, depending on your environment, from the following list:

tar xvf ivi_solaris.tar <Return>
or

tar xvf ivi_linux.tar <Return>
or

tar xvf ivi_cygwin.tar <Return>

For convenience, you should now rename the file you have downloaded from its longer name (ivi_solaris, ivi_linux, or ivi_cygwin) to the shorter name of simply ivi. This may be accomplished by means of the mv command. To do this, type:

mv ivi_solaris ivi<Return>

or the equivalent command for linux or cygwin versions (mv ivi_linux ivi or mv ivi_cygwin ivi).

Check whether ivi is available by typing in the terminal window:

 ./ivi <Return>

You should see a screen that looks like this:

To exit ivi, first make sure you are in command mode by typing <Ctrl c>, then type:

GO <Return>
GO is short for GOodbye. You should be returned to the terminal window.

Before going an farther, you should familiarize yourself with the ivi editor. A short ivi tutorial may be found here. Read it then practice with ivi until you are comfortable with it.

Downloading example datafiles

To help with the explanations to follow, a number of simple datafiles have been created by the authors of VINCI. To download these, click here. Make sure they are placed in the MyVinci directory. Verify that this has been done by opening a terminal window, going to the MyVinci directory and typing:

ls <Return>
You should see the file tutorial_datafiles.tar.gz in the file listing. If it is there, you are safe to proceed. If it's not, there has been a problem with the download and you should repeat the steps above or consult a more experienced computer user.

Assuming that the file has been safely downloaded, you now need to uncompress it by typing:

gunzip tutorial_datafiles.tar.gz <Return>

and then

tar xvf tutorial_datafiles.tar <Return>

In the terminal window, enter the command ls to see the list of files. Among others, you should see the files att1.at and lex1.le.

If all is well, you are now ready to begin testing the generation environment. If there is a problem, repeat the previous steps, or consult a more experienced computer user.

Steps in generation

In the ivi/VINCI environment, generation involves four steps:

  1. creating new language description files, or alternatively locating and possibly modifying existing files;

  2. installing language description files into the generator;

  3. generating output and examining it inside ivi;

  4. saving generated output as files.

In a single working session, you may go through these steps any number of times.

Examining, editing or creating language description files

We will begin by examining some of the files we have just downloaded. To do this, we start up ivi (by typing ./ivi). Our first goal is to examine a file which defines the parts of speech available for generation. In what follows, we will sometimes refer to these as terminals, and as a result, the file which describes them is called a terminals file. Files used by ivi/VINCI may have any name composed of letters or digits or underscores. By convention, we use a one or two letter suffix to help in sorting files, but in creating new files, feel free to adopt whatever convention suits you.

Call in the terminals file term1.tm for inspection using the FEtch command, by typing:

FE term1.tm <Return>

The editor screen should now look like this:

Note that the first line of the file is enclosed in brace brackets. This is a comment; it is there for the human reader only. Anywhere in VINCI language files, anything inside brace brackets is invisible to the generator. It is good practice to add many comments to your files to make them easier to interpret by others, or by you at some later date.

The next line of the file contains the letter N followed by a comment telling us that N stands for a noun, and the following line contains the sequence DET and a comment telling us that it stands for a determiner.

It should be clear that the terminals file defines the permissible parts of speech to be used by the generator. We will see later that other language description files will refer back to this information.

A note: the names of parts of speech are defined by you and may take any form, as long as they begin with a letter or underscore and contain no spaces. However, by convention, we always use only uppercase letters.

Now call in another file which describes a simple syntax rule. To do so, type:

FE syn1.sy <Return>
The file should look like this:

There are several important elements to the syntax file. The first is the existence of comments, just like in the terminals file.

The second element is the keyword ROOT. Keywords are defined within VINCI and cannot be changed. ROOT tells the generator that a new syntax tree is to be started. It is followed by an equals sign and then the definition of the tree. In this instance, the tree has only one node, called N. This refers back to the N as defined in the terminals file we have just seen.

The third element is the percent sign, which tells the generation system that the rule has ended.

This is a very simple rule, which basically says that there is a syntax tree with a ROOT and one child.

Now let us read in another language description file which contains lexical information. To do so, type the command:

FE lex1.le <Return>
The file should look like this:

This file is a bit different in that it's formed of records, each on a single line. Each record defines a lexical entry. Records are composed of fields separated by vertical bars. The role of some fields is set by VINCI, but the user may use others for a variety of purposes.

The first field gives the headword. Note that it's in double quotes. The second field gives the part of speech. It is not in quotes. Material in quotes belongs to the language being generated (the object language), while other symbols belong to the metalanguage.

The first entry in the lexicon is "cat"; its part of speech is N (standing for Noun, as defined in the terminals file; note how different files refer to common information).

The third and fourth fields are empty and the fifth contains the symbol #1. This is a simple morphology rule. For the moment, it is sufficient to know that #1 tells the generator to use the first field when this lexical entry is called in generation.

We have now seen three files. In order to generate output, we need to make them available to the generator itself, a process we call installing the files. There exist separate commands to install each file. In the next section, we will see how they work.

Installing files

One of the special features of ivi is that diagnostic messages are shown in Core 7. As a result, it is sometimes useful to test generation files from within Core 7. To do this, enter the command CO 7, where CO is short for COre. After the command has been entered, the screen should look like this:

The Setting random seed message is there because when ivi is started, a random seed is set which will control choices made in generation. We will see later that this may be used to repeat precisely the same output in subsequent generations. For the moment, it is safe to ignore the message.

It is now time to install the files needed for generation. Files must be installed in order, since some make use of others. Begin by installing the terminals file by means of the TMnls command. To do this, type:

TM term1.tm <Return>

If the command is successful, a message should appear in the text area of Core 7 which looks like this:

On the other hand, if you have made a typing mistake, or the file is not available, you will see an error message on the status line, just above the command line, which reads:

File not found (or no read permission)

In that case, just ensure that the file exists and retype the command correctly.

Once the terminals file has been installed, you can install the syntax file using the SYntax command by typing:

SY syn1.sy <Return>

If all goes well, you should see the message:

Finished reading Syntax Input from file 'syn1.sy' [6 lines.]

Finally, install the lexicon file by means of the LExicon command, by typing:

LE lex1.le <Return>

You should see the message:

Finished reading lexicon input from file 'lex1.le' [3 lines.]
Total Words: 3. Number of Errors: 0

Generating output

You are now ready to begin generation. To do this, move to text mode (by hitting the enter key while in command mode). The cursor should move to the text area. Now generate an utterance by typing <esc> <g> (depress and release the escape key and then depress and release the g key).

You should see something like this:

Congratulations! You have just generated your first utterance.

Now, generate another utterance by typing esc-g again. You may see either "cat" or "dog" (two of the words from the lexicon file). Generate several more utterances and note that the two words are chosen at random. However, the third item from the lexicon file ("the") is never chosen because its part of speech (DET) is not specified in the syntax file.

We will now remedy that.

Revising language description files

First, return to Core 1. To do this, type <Control c> to return to Command mode, then type CO 1 <Return>. You should now be in Core 1. Now, call the initial syntax file back in for editing by typing:

FE syn1.sy <Return>

You now want to change the file. First, press <Return> to enter text mode and then move the cursor on top of the N. Then, hit the Insert key or type Ctrl w to enter Insert Mode. The command line should now read Expecting Insert). Now type DET so that ROOT is now equal to the sequence DET N.

Save the revised file under a new name by typing:

SA syn2.sy <Return>

To ensure that the new syntax file exists, you can use the FEtch command by typing:

FE syn2.sy <Return>

The new file should appear on the screen. (Typing the command FE syn1.sy should recall the old syntax file.)

Generating new output from revised files

Let us now return to Core 7 and install the new syntax file, thereby replacing the old one. To do this, enter the command:

SY syn2.sy <Return>

A confirmation of the new file should appear in the text area. If it does, you are ready to generate a new set of utterances. First return to text mode (by hitting Return) and then type <esc> <g> several times. You should see something like this:

Generating output in other corefiles

So far, all output has appeared in Core 7, interspersed with diagnostic and error messages. It would be nice simply to see the output alone. To do this, move to an empty corefile (in this case, Core 2) by entering the command CO 2. In Core 2, enter text mode by typing Return and enter the command:

esc m 0 <Return>

You should see in the text area either

the cat
or

the dog

Now, type esc m 0 <Return>; three more times. You should see three more occurrences of the same string you saw the first time. This is because esc m 0 <Return> simply inserts into the current corefile the currently generated string. To get a new string, you need to type esc g and then esc m 0 <Return>. If you do this enough times, you will see a different string in the corefile.

Saving output to a file

To save the results of this output, go to command mode (<ctrl c>) and enter the command SA fred <Return> (feel free to choose something else in place of fred). To exit ivi, type the command GO and hit <Return>. You should find yourself back in the terminal window. Next time you return to ivi, typing FE fred will call the output file back.

You have now called in some already existing files, used them for generation, edited one of the files, generated again, saved our output and exited ivi. Everything which follows will be a variation on this.

Using attributes

One of the problems with generated output so far is that there is no way of inflecting words to show number (cat - cats), tense (run - ran) etc. In VINCI, one of the ways this may be accomplished is by means of attributes.

In their simplest form, attributes are sets of values allocated among distinct classes. Users may define any names they choose for classes and values, as long as names begin with a letter or underscore and include no spaces. By convention, in what follows, and elsewhere in our research, we have adopted the convention whereby attribute classes begin with a capital letter, while attribute values are all in lowercase.

To be used, attributes must be specified in a file which is installed before any other file which uses attributes. To illustrate this, we will begin by examining a simple attribute file. Make sure you are in the MyVinci directory and then start ivi. Inside ivi, type:

FE att1.at <Return>
You should see a file which looks like this:

Examination of this file shows that it defines a class Number with values sing and plur and a class Things with values animal and plant.

Once a set of attributes has been defined, it may be used in other files. To show how this can be done, we will examine a variant lexicon file by typing:

FE lex2.le <Return>
This will bring up a file which looks like this:

Note that in a lexicon file, attributes appear in the third field. In the example shown here, each lexical entry contains one of the values for the class Number and one for Things. The distinction between singular and plural nouns is captured by having two entries for each noun. Within the third field, attributes are separated by commas.

Similarly, a syntax file may refer to attribute values or classes in order to specify in more detail the nodes of a tree. To illustrate this, let us begin by calling up the file syn3.sy, by typing:

FE syn3.sy <Return>

We see this:

In a syntax file, attributes attached to a node are placed within square brackets which immediately follow the node. In this case, the attribute specifies that the N chosen from the lexicon by the syntax rule must carry the value sing. In other words, only singular nouns will be chosen.

In order to make attribute classes and values available to other files, the ATtribute command is used.

In order to generate utterances using the files we have just seen, once ivi has been started, the following commands must be entered:

CO 7 <Return>
AT att1.at <Return>
TM term1.tm <Return>
LE lex2.le <Return>
SY syn3.sy <Return>

These move the focus to Core 7 and then install the various files. On the basis of this, we would expect to generate a series of singular nouns, and this is in fact what we see when we move to text mode and type a series of esc g. Output should look something like this:

As an experiment, edit the syn3.sy file to replace sing by plur, SAve it under a the name syn3a.sy, install the new syntax file using the SYntax command, and generate output. You should see plural nouns.

As a further experiment, change the syntax file again to obtain only plural animal names. (Hint: attributes in syntax rules are also separated by commas.)

Morphology rules: the basics

In the lexicons we have seen so far, inflected forms of words appeared as separate lexical entries. In languages with simple morphology like English, this is perhaps not an insurmountable obstacle as the lexicon increases in size, but in others with richer morphologies the result would be an unreasonable expansion. To deal with this, VINCI includes mechanisms for inflecting lexical entries based, among other things, on the attributes present on the syntax nodes. In what follows, we will show how this may be done in a few simple cases.

Consider first the case of cat and cats. The only difference between the two is the addition of s to the plural form. We can capture this by means of a simple rule like this:

rule 1                  {This is the name of the rule}
   sing : #1;           {If the attribute 'sing' is present, use field 1}
   plur : #1 + "s";     {If the attribute 'plur' is present, add s to field 1}
%                       {End the rule}

This rule is defined in a morphology file. (The file has been included with the tutorial materials as mor1.mo.) For it to be used to inflect words, it must be referred to in each lexical entry to which it will apply. This is done by editing the contents of fields 3 and 5 of each lexical entry. If we modify our old lex2.le file, the result would look like this:

"cat"|N|Number, animal||$1|
"dog"|N|Number, animal||$1|
"bush"|N|sing, plant||#1|
"bushes"|N|plur, plant||#1|
"tree"|N|Number, plant||$1|
"the"|DET|||#1|

Note how in the case of cat, dog and tree the #1 in field 5 has become $1. The symbol $ followed by a string of digits or letters points to a rule name in the morphology file. (The revised lexicon file has also been included in the tutorial package as lex3.le.) Note also that in these same lexical entries, the attribute class Number appears. This means that all values in the class are now possible (in this case, sing and plur).

To test the new morphology file and lexicon, enter the following commands in ivi. Note the addition of a new command MO which installs the morphology file. Note also the syntax file syn4.sy which calls for the production of a determiner followed by a plural noun.

CO 7 <Return>
AT att1.at <Return>
TM term1.tm <Return>
MO mor1.mo <Return>
LE lex3.le <Return>
SY syn4.sy <Return>

Type <Return> to enter text mode and generate some utterances. If you have installed files appropriately, you should see appropriately formed plural nouns.

The perspicacious reader will have noted that there are still two entries in the lexicon for bush whose plural requires addition of es to the base form. As an exercise, revise the morphology file to add a second rule which captures this fact, and the lexicon to call this new rule, and generate some utterances using these revised files. You are now on your way to producing your own morphological descriptions. Of course, VINCI's morphological mechanisms include a rich set of operations which are beyond the scope of this simple tutorial. See the Overview and the Manual for details.

Conclusion

You should now have a good sense of how to use ivi/VINCI to generate simple utterances, and how to extend language descriptions to capture richer sets of possible structures. Over the past twenty years, the authors of ivi/VINCI have used it to generate a wide range of output in several languages. Discussion of this appears in our various publications. We are also in the process of encapsulating sample language description files so that they will be available for use and extension by others. See the main web page for details.

Postscript: installing ivi for access by all users (for more advanced users)

In the preceding sections, we have assumed that ivi will be installed in the MyVinci directory, along with all the data files to be worked on. If you have authority (in other words, if you have superuser privileges or if you are using Cygwin, which does not have privileges), it might be worthwhile to install ivi in a directory where it will be available to all users. A command like this might work:

cp ivi /usr/local/bin/ivi

If this command (or one like it) succeeds, typing ivi in any directory will call up the program.

If you don't have superuser privileges, another alternative is to place ivi in some directory which is in your path (that is, the set of places where the system will search to find a command to be executed). You can set the path for your particular system without superuser privileges. Search internet sites for explanations on how to do this on your particular system. Once this has been done, you will no longer need to type ./ before ivi in order to call it up (./ tells the system to look for a command in the local directory.)