A Computational Study of the Selection of Eye Fixation
Locations
Roger A. Browse and Marion G. Rodrigues
Abstract
Throughout the brief history of computational vision research, there has been a tendency towards
increased consideration of the psychology and neurophysiology of human vision (eg. Marr,
1982). Despite this trend, machine vision systems rarely take into account the graded resolution
retinal structure or the selection system responsible for repositioning the high resolution fovea.
There are computational advantages to assuming exhaustive processing of uniform resolution
images, but this simplification perpetuates the misconception that all locations need to be
processed with equal scrutiny in order to permit effective vision.
In our research, we have developed computer vision systems which utilize graded resolution
input, and select processing locations autonomously. There are three main reasons for taking this
approach:
- It seems reasonable to expect that the processes of human vision, and the knowledge used
in vision, are highly tuned to the selective use of the fovea, and to integration of separate
glimpses of the world into a coherent percept. The computational modelling of such
processes may well hold the key to the elusive goal of generalized machine vision.
- As mobile robotics becomes an important application of machine vision, it is clear that a
solution will be necessary to the problem of aligning sensors with critical aspects of the
environment. This issue of selection of processing location is even more serious in
localized senses for robotics such as tactile perception (Browse and Lederman, 1986).
- Artificial intelligence research (of which computer vision is a sub-field) seeks out tasks
which demonstrate the depth of human intelligent action. Though also accomplished in
lower animals, we believe that the selection of fixation locations in humans exhibits a
complex interaction amongst (1) response to stimulus cues (Yarbus, 1967), (2) goals of
the perceiving system (eg. Loftus and Mackworth, 1967), and (3) status of interpretation
processes, and as such is an ideal example of a task amenable to the techniques of expert
systems and computational analysis in order to access the nature of intelligence in
general.
Our research has centered on the development of a computational technique called propagation
by which detailed understanding of image content available at the fovea may be used to enhance
the course level understanding available peripherally and parafoveally (Browse, 1983; Rodrigues,
1986; Browse and Rodrigues, 1987). In support of this idea computational systems have been
devised which operate in both domain-dependent and domain-independent modes. In addition,
informal experiments have been carried out which support the idea that foveal based
interpretation may be propagated to the periphery to create a percept of high resolution in a
broader field of view than is actually available in a single fixation (Rodrigues, 1986). Given that
one goal of vision is to produce high resolution interpretation in as broad an area as possible with
a minimum number of fixations, one may examine peripheral locations for their potential to
permit propagation, and use this as a measure in the selection of fixation locations (Browse,
1983; Browse and Rodrigues, 1987).
In this paper we explore the computational basis of such selectional criteria in a model which
mediates between requirements for the development of a primal sketch (Marr, 1982), and the
development of interpretations based on both local and global inputs (Navon, 1977).