Knowledge-Based Visual Interpretation Using
Declarative Schemata
Roger A. Browse
Abstract
One of the main objectives of computer vision systems is to produce structural descriptions of
the scenes depicted in images. Knowledge of the class of objects being imaged can facilitate this
objective by providing models to guide interpretation, and by furnishing a basis for the structural
descriptions. This document describes research into techniques for the representation and use of
knowledge of object classes, carried out within the context of a computational vision system
which interprets line drawings of human-like body forms.
A declarative schemata format has been devised which represents structures of image features
which constitute depictions of body parts. The system encodes relations between these image
constructions and an underlying three dimensional model of the human body. Using the
component hierarchy as a structural basis, two layers of representation are developed. One
references the fine resolution features, and the other references the coarse resolution. These
layers are connected with links representative of the specialization/generalization hierarchy. The
problem domain description is declarative, and makes no commitment to the nature of the
subsequent interpretation processes. As a means of testing the adequacy of the representation,
portions have been converted into a PROLOG formulation and used to "prove" body parts in a
data base of assertions about image properties.
The interpretation phase relies on a cue/model approach, using an extensive cue table which is
automatically generated from the problem domain description. The primary mechanisms for
control of interpretation possibilities are fashioned after network consistency methods. The
operation of these mechanisms is localized and separated between operations at the feature level
and at the model level.
The body drawing interpretation system is consistent with aspects of human visual perception.
The system is capable of intelligent selection of processing locations on the basis of the progress
of interpretation. A dual resolution retina is moved about the image collecting fine level features
in a small foveal area and course level features in a wider peripheral area. Separate
interpretations are developed locally on the basis of the two different resolution levels, and the
relation between these two interpretations is analysed by the system to determine locations of
potentially useful information.