Knowledge-Based Visual Interpretation Using Declarative Schemata

Knowledge-Based Visual Interpretation Using Declarative Schemata

Roger A. Browse

Abstract

One of the main objectives of computer vision systems is to produce structural descriptions of the scenes depicted in images. Knowledge of the class of objects being imaged can facilitate this objective by providing models to guide interpretation, and by furnishing a basis for the structural descriptions. This document describes research into techniques for the representation and use of knowledge of object classes, carried out within the context of a computational vision system which interprets line drawings of human-like body forms.

A declarative schemata format has been devised which represents structures of image features which constitute depictions of body parts. The system encodes relations between these image constructions and an underlying three dimensional model of the human body. Using the component hierarchy as a structural basis, two layers of representation are developed. One references the fine resolution features, and the other references the coarse resolution. These layers are connected with links representative of the specialization/generalization hierarchy. The problem domain description is declarative, and makes no commitment to the nature of the subsequent interpretation processes. As a means of testing the adequacy of the representation, portions have been converted into a PROLOG formulation and used to "prove" body parts in a data base of assertions about image properties.

The interpretation phase relies on a cue/model approach, using an extensive cue table which is automatically generated from the problem domain description. The primary mechanisms for control of interpretation possibilities are fashioned after network consistency methods. The operation of these mechanisms is localized and separated between operations at the feature level and at the model level.

The body drawing interpretation system is consistent with aspects of human visual perception. The system is capable of intelligent selection of processing locations on the basis of the progress of interpretation. A dual resolution retina is moved about the image collecting fine level features in a small foveal area and course level features in a wider peripheral area. Separate interpretations are developed locally on the basis of the two different resolution levels, and the relation between these two interpretations is analysed by the system to determine locations of potentially useful information.