Propagation of Interpretations Based on Graded
Resolution Input
Roger A. Browse and Marion G. Rodrigues
Abstract
The graded resolution and selectional aspects of human vision have only rarely been considered
in the development of computational vision systems. Such considerations may yield a great
advantage for machine vision because they provide a means of avoiding the fine detailed
processing of entire images. This paper presents a process called propagation which permits the
integration of image understanding based on distinct levels of resolution such as might be
available in a single fixation of a human eye. Two examples of this propagation process are
given: one which deals with image content at the level of the primal sketch, and the other which
deals with interpretations based on object models. The results obtained with peripheral
information may be examined to predict locations which, if fixated, have the greatest probability
of allowing propagation. These are the candidates for subsequent eye position location.
Introduction
Computational vision research pursues the long range goal of the development of a generalized
vision system capable of performance comparable to that of biologic systems. In light of this
goal, it is not surprising that through the brief history of Computer Vision, there has been
increasing consideration of the operation of biologic systems in the design of machine perception
(e.g. Marr). Despite this trend, there remain two widespread assumptions about machine vision
which stand in sharp contrast to the realities of human perception.
The Uniform Retina Assumption
Due no doubt to the nature of available digitizing equipment, computer vision research has
almost unanimously chosen to assume that a common resolution level is available at all points of
the field of view. Such uniform images may be subjected to uniform processing, yielding an
immensely important simplification from the case of human vision in which acuity falls off
sharply even as close as 2 degrees from the foveal center. While this simplification appears
important for commercial applications of machine vision research, it perpetuates the
misconception that all locations of an image must be processed with equal scrutiny in order to
permit effective vision. Edge detection schemes employing center-surround mechanisms often
refer to the work of Wilson and Bergen as a statement of compatibility with human vision for the
convolution of images with operators of different scale, without ever noting that Wilson and
Bergen's results included a gradation in scale of about 40% within 4 degrees of the foveal center.
The human system has probably evolved the graded resolution structure of the visual system in
order to permit both high resolution and a broad field of view within limited processing
resources. It seems reasonable to expect that the processes of vision , and the knowledge
involved in vision, are highly tuned to the selective use of the fovea, and to the accomplishment
of integration of visual input obtained at many different scales into a coherent percept.
The research described in this paper is based on the belief that the time has come to stop
considering this graded structure of the retina as a unfortunate detail of the biologic
implementation, and to move towards developing the computational underpinings of the
operation of realistic biologic vision.
The Appropriate Image Assumption
Computational vision systems usually make the assumption that the image under analysis
happens to contain an object or scene of the type that the system expects, usually nicely centered.
Of course, natural vision involves the requirement to align the sensors with the aspects of the
scene that are of interest. In particular, any vision system which uses grades resolution must
necessarily face the problem of selecting locations at which to apply central vision. During
picture viewing humans reposition their eyes about 3 times per second selecting important
information for detailed analysis and incorporation into a percept which seems independent of
these movements.
During the early period of computer vision research the assumption of appropriate images was
certainly justified in order to facilitate the investigation of the interpretation process. More
recently, however, robotics has emerged as a major application for machine vision, and mobile
robots will certainly require the ability to select areas to image from within their environment.
In this paper we present a process called propagation which allows the fine level understanding
available at the fovea to be used to draw inferences about the nature of information available
peripherally, at lower resolution. Two versions of this process are described which operate
domain-independently and domain dependent. In the final section we provide an outline of how
the results of the propagation processes may be used to select locations for the application of
foveal vision.