Relations Between Schemata-Based Computational Vision
and Aspects of Visual Attention
Roger A. Browse
Introduction
This paper explores relations between aspects of visual attention and the operations of schemata-based computational vision systems. These relations are shown to suggest the requirement for
methods which operate towards interpretation without model invocation. A specific mechanism
is described which permits interpretation based interaction between information from different
resolution levels, but does not rely on model invocation. This mechanism is then used in the
examination of some related perceptual phenomena, permitting a more computational view of
their operations.
Schemata-Based Vision Systems
An issue of interest to both cognitive psychology and artificial intelligence is the question of how
knowledge of a domain of objects can be applied towards visual interpretation. Schemata-based
knowledge organizations (Rumelhart and Ortony, 1976; Neisser, 1976) are now being used to
address this issue (Freuder, 1976; Havens, 1978; Havens and Mackworth, 1980; Browse, 1980).
One distinctive feature of a schemata-based interpretation is the organization of its domain
knowledge along "natural" lines. The knowledge is object centered and relies on familiar
structuring mechanisms such as component and instance hierarchies.
A domain of knowledge structured in this way is conducive to a recursive cuing mechanism
(Havens, 1978): basic image elements act as cues for simple scene objects, which in turn act as
cues for more complex objects, etc.
For example; in the domain of line drawings of human-like body forms (Browse, 1980; 1981), a
certain configuration of lines may cue a "hand", which in turn cues "arm", which cues "body".
At each level of this hierarchy, objects are described as being composed of simpler objects.
The occurrence of the objects which are required in the description may not be enough, however,
to confirm the existence of the more complex object. There are also relations which must be
valid among the components. This distinction will be referred to as the distinction between
having found the required elements and having met the required relations.
For example, all the required elements may exist to make up an "arm": the "hand", the "upper-arm",
and the "lower-arm", but a number of required relations must also hold. The elements must
be connected in a certain way, and the angles between the elements must be within certain
bounds.
While it is difficult to be certain of the presence of an object on the basis of the required elements
only, we shall see that there are special situations in which this information is very valuable.
These situations rely on a capability of grouping image elements.
During the interpretation process, any element X in the image will have associated with it a set of
model possibilities (or labels). This set is simply the set of all objects which are described using
X as a required element. In the absence of a means of grouping elements, the interpretation
process may deal with the discovery of an element by taking the course of model invocation (or
testing). This operation involves selecting one or more of the model possibilities, and testing for
their existence by locating the other required elements, and determining the validity of their
required relations.
The model invocation approach can provide a dynamic determination of whether the processing
procedes top-down or bottom-up (see Havens, 1978). As well it can provide a means of iterative
refinement of interpretation and segmentation (see Mackworth, 1978). The operation of model
invocation can, however, be costly because it is an exhaustive search over the model
possibilities.
On some occasions it may be feasible to delete some of the model possibilities without actually
invoking them. This is possible whenever uniform constraining relations can be devised over a
type of image element.
For example, if we know that certain lines must be a part of the same object, then the model
possibility sets for those lines can be intersected.
Waltz (1972) has shown that such a uniform constraint may be formulated for the interpretation
of line drawings of the blocks-world. The result was that Mackworth (1977) has provided a
generalization of the use of such network consistency methods in artificial intelligence problems.