Quilt: Design for the Software Carpentry Contest build Category

Overview

Quilt is a tool I have designed, on and off, for the last 15 years or so. This version is a submission to the Software Carpentry contest, in the SC build category. Due to my prolonged illness this entry is incomplete, but I am submitting in hope that there is enough content to show the judges what I had planned to do.

Much of the content is revisions of earlier papers, cited in the references. My tools (wc of a lynx -dump) say this is just under 4420 words long.

This document is Copyright 2000 David Alex Lamb, and is available under the Software Carpentry Open Publication Licence

Contact

Name David Alex Lamb
Address Department of Computing and Information Science
Goodwin Hall
Queen's University
Kingston, Ontario, CANADA K7L 3N6
E-mail http://www.cs.queensu.ca/~dalamb/

Requirements

The requirements phase of this project consists of identifying the original requirements stated at the Software Carpentry site; identifying my own personal additional requirements, and developiong a conceptual model of the long-term data the tool must manage.

Original Requirements

The original requirements for this project come from the Software Carpentry SC Build contest category.
  1. Suitable replacement for UNIX make, with features such as:
  2. Managing and representing dependencies (item 1)
  3. Handling options in compilation (item 2)
  4. Handling complex directory structures, such as separate build and source trees (item 4)
  5. Using techniques other than timestamps (item 6)
  6. Inspecting a complex build, e.g. graphical dependency display (item 11)
  7. Keeping track of artifacts it has produced, e.g. for cleanup (item 13)
  8. Feedback on the progress of the build (item 10)
  9. Management of tools used during the build:

Additional Problem Analysis

My previous work has involved systems with the following characteristics: These experiences have let to my personal goals for this project:
  1. Make reasonable use of Ellen Borison's model of software manufacture and my previous related work on abstractions and relations.
  2. Provide a flexible mechanism for describing build steps, separating inputs into explicit (e.g. a file to compile) and implicit (e.g. the #include files.
  3. Cope with complex projects where portions of a system are under the control of separate groups, e.g. separate "front end" and "back end" and "tools" teams for a compiler-building project.
  4. Allow for, but do not require, interaction between the build process and other tools. For example, a language processor might produce a list of direct dependencies it discovers, and ask the build processor to record them (so that in future they're done first).
The following sections show several specific problems that occur in this kind of system.

Hidden Dependencies

Consider the following example, which shows a make input file for a small system consisting of a main program and two submodules. This example ignores make's mechanisms for specifying default rules, which could shorten the example somewhat, but do not solve the problems we address here.
CCFLAG=-c -g
prog: main.o mod1.o mod2.o mod3.o
	ld -o prog main.o mod1.o mod2.o mod3.o
main.o: main.c mod1.h mod2.h
	cc ${CCFLAG} main.c
mod1.o: mod1.c mod1.h mod3.h
	cc ${CCFLAG} mod1.c
mod2.o: mod2.c mod2.h mod3.h
	cc ${CCFLAG} -DLISTOPTION=short mod1.c
mod3.o: mod3.c mod3.h
	cc ${CCFLAG} -DDEBUG mod1.c
Each line that starts in the left margin says that the target file, whose name precedes the colon (:), depends on all those files whose names follow the colon; if any of them change, make must use the immediately following command to rebuild the target. The dependency of prog on main.o arises because main.o is a relocatable file we pass to the linker to make up the executable program. The dependency of main.o on main.c arises because main.c is the source file from which we get the relocatable file main.o; the dependencies on mod1.h and mod2.h arise because main.c incorporates these two files by file inclusion. However, there are several more hidden dependencies. Even if we could in principle record all these dependencies, forcing a system designer to specify the full manufacture graph in such detail would cause enormous headaches. Adding independent subsystems causes even more problems, since now portions of the manufacture graph for a system would be under the control of the developers of different subsystems.

Manufacture Abstractions

Anyone who does much complex software manufacture eventually develops a collection of idioms she uses over again. Such idioms are patterns of multi-step manufacture subprocesses; they have several characteristics: make has no abstraction mechanism that can express such idioms; the only way to capture them is to use tools such as the sed stream editor to generate portions of a make input file.

For example, the following figure shows the process I used to produce individual chapters of a Software Engineering textbook.

The & symbol stands for the name of a chapter. The & object is the main exported result; it involves first making the printable form of the chapter (&.hp), then postprocessing some auxiliary outputs to update global book-wide data files. The &.chk object involves building three different objects (&.spl, &.styl, &.dic), which contain reports from simple machine proofreading programs; you might want to select each of them independently. Some objects serve as intermediate processing points; for example, &.de is the result of filtering out text formatter markup that would confuse the proofreading programs; it is input to two of the proofreaders.

We can look to standard ideas from programming language design to solve these problems. Given any language semantic unit X, such as a manufacture step, we can introduce the idea of an X-valued abstraction. Thus the input language for a manufacture tool should allow parameterized abstractions. The parameters would be objects (such as files or strings). Thus, for example, I might invoke a Chapter abstraction as:

intro: chapter(intro.n)
An abstraction would act somewhat like a generic module (Ada) or template class (C++). After you invoke such an abstraction, you could reference individual exported components (such as intro.use, or intro.hp). You could independently build any exported component. The manufacture tool might support a library mechanisms for such abstractions, so you could reuse them.

One sign that an idea is a good one is for it to elegantly solve problems other than the ones you planned for. The serendipity with abstractions is that the same mechanism works for subsystem descriptions; a subsystem might be just an abstraction that you invoke only once per system.

Implied dependencies

make has difficulty handling transitive dependencies. For example, consider the C source file x.c, which incorporates several .h files by file inclusion:
x.o: x.c x.h lib1.h lib2.h lib3.h
	cc $(CCFLAG) x.c
Now suppose we are trying to simulate abstract data types in C. Module lib1 might define some type with some fields intended to be visible to its clients, and others intended to be hidden. Suppose we add some "hidden" fields, and so modify lib1.h to include adt1.h. Unfortunately, we cannot really hide such fields in C, even though we know we will not reference them in clients of lib1; we must recompile x.c whenever adt1.h changes, since the sizes of some records defined in lib1.h might change. Existing approaches have problems. I believe the best approach is to separate out different kinds of dependencies. Thus x.o has a clear-cut direct dependency on x.c; however, depending on x.c implies a dependency on lib1.h and so on; depending on lib1.h implies a dependency on adt1.h. We can construct a database of such implied dependencies. If x depends directly on y, and y implies a dependency on z, then changing z means rebuilding x, without rebuilding y. We can build such a database by hand if we have no appropriate tools. When we can afford to schedule the effort to build one, a database editor can invoke language-specific scanners to find or check such dependencies when asked to do so.

Derived dependencies

Suppose source file mod.c depends directly on lib1.h. Normally, lib1.h is the specification part of a module whose implementation is in lib1.o. This means that mod.o implies a dependency on lib1.o; that is, any system that includes mod.o must also include lib1.o.

Current systems manage this problem in one of two ways, both of which have some difficulties.

  1. The system builder might explicitly list all relocatable object files for the linker. This is a tedious and error-prone process if done manually.
  2. The system builder might instruct the manufacture tool to place all such relocatable object files in a linker library, and rely on the library external symbol resolution rules to link in all appropriate components.
The second solution can work if each separate project using several subsystems can afford the space for a library file large enough to hold copies of all the object files of all the subsystems, and if the linker can correctly resolve backward references in such a library. However, many projects cannot afford the space for so many copies; instead, they expect to have separate libraries for each subsystem, shared among all the users of a subsystem. Furthermore, many linkers are not clever enough for such a search strategy; they may expect new object modules to refer only to previously-linked symbols, or to symbols that will appear in later object modules. Even if they do allow arbitrary references within a library file, they may not allow it across a list of library files; thus, projects that cannot afford so many copies of object modules are out of luck if there are any loops in the relationships among subsystems.

A related problem for systems that must explicitly list what object modules to link is subset management. Suppose a subsystem exports k modules, each of which depends on different subsets of its hidden modules. Different clients might import different subsets of the exported modules. The problem is that there are potentially 2k-1 distinct collections to manage.

A possible direction for a solution is to develop a mechanism for the process of constructing mod.o from mod.c to generate an implied dependency of mod.o on lib1.o; we call this a derived dependency. The manufacture step for constructing the executable program would specify only a direct dependency on main.o, the object file for the main program. The manufacture tool could deduce all the rest of the object files from implied and derived dependencies. Indeed, the simplicity of Ada-specific or Modula-specific manufacture tools is that they can deduce these two kinds of dependencies automatically.

System-wide processing

The need to find all the appropriate object files to link is a special case of the general problem of processing all files of certain characteristics from all subsystems as a group. Other examples from my past projects A solution to this problem seems to require two mechanisms:
  1. Abstractions need the ability to add dependencies to a parameter or global object. For example, the IDL translator abstraction could add each structure symbol table to an "all IDL symbol tables" object.
  2. There should be some means for turning a set of dependencies into input for a tool.

Requirements Analysis

The first two sections, on original requirements and problem analysis, are the system requirements to analyze. The three main steps are identifying the primary concepts, identifying actors and use cases, and representing the analysis of this information as a conceptual model.

Primary Concepts

A common first step in analysing requirements is to look for frequently-occurring words or phrases (shown in italics in the following). The main concepts expressed in these requirements are:

Actors and Use Cases

These requirements suggest four main roles (actors in UML-speak), and one that combines two others. As usually, one person might perform some or all of these roles.

Conceptual Model

The diagram is incomplete; it needs to show tools, files, and strings as subclasses of object. Objects also need "types" such as programming language source file, relocatable object file, and so on.

Design

At this stage of development we augment the classes of the conceptual model with additional implementation-specific classes, assign responsibilities to specific classes, and identify collaborations among the classes.

Borison's Model

In this work, I assume the basic software manufacture tool follows Borison's model. The key aspects of this model are as follows:

Relations

For concreteness we suggest a particular way of encoding relations as an incremental change in the current make input language.

Expressions

The following table defines the operations permitted in such expressions. If several such definitions occur, the relation is their union; this permits a form of distributed specification of a relation.
OperationInfixPostfix
;composition
!inverse
+uniontransitive closure
-set difference
*intersectionreflexive transitive closure
( )parentheses (grouping)
One specifies when to rebuild a generated file by naming the relation that governs generation.
output_file (expression):
    command ....
means that, whenever any file related to output_file by the given relational expression changes, one executes the given command to rebuild the output file. We permit only one such declaration per output file; in future work we will investigate methods of disambiguating among several of them.

Certain dependencies are common enough to require some form of rule. Thus we allow

&.c :(c_derives) &.o
to mean that any file with type .c has relation c_derives to a file of the name name with type .o. This is similar to the metarules of the mk[mk] program; however, we allow naming several separate dependencies, while mk allows more complex name pattern matching.

Sets

Any set of files corresponds to an identity relation. The expression (rel!);rel defines an identity relation for the range of rel; similarly, rel;(rel!) corresponds to the domain. Certain useful sets correspond to such domains and ranges; we introduce dom(relation) and rng(relation) to represent them. The set of all object modules generated by the C compiler is thus rng(c_derives). Composing a relation with a set means restricting the domain or range of the relation; thus rel;set is the subset of rel with range restricted to set. We can say {file1,file2,...,fileN} to explicitly construct a set of N named files.

Tools may need to process lists of files obtained from relations; we thus give a way to generate a string from a set or relation. The expression @[relation] constructs a string consisting of the names of all the files in the domain and range of the relation, separated by spaces. @[option relation] gives a sequence of files in a topological sort of relation; if file A is related to file B, then the < (>) option requires that A appear before (after) B in the result string. @[option relation[file]] gives a topological sort of the named relation starting with the given file instead of with the natural roots of the relation. @[relation[file]] is the set of files related to the given file by the given relation; it is a shorthand for @[rng({file};relation)]. As topological sorting is related to transitive closure, @[option relation+] gives the same results as @[option relation].

Implementation Classes

The implementation requires augmenting the class diagram of the conceptual model with classes that give additional detail (such as relational expressions), and which provide much of the detailed processing. Having run out of time, I omit the former and summarize the latter in the next section.

Responsibilities and Collaborations

The following figure illustrates the basic subsystem of the quilt prototype.
The core of quilt consists of:
  1. The set/relation database manages sets, such as "all C source files", base relations (such as "file directly includes other file"), and relational expressions (such as "directly or indirectly includes").
  2. The database may interact with other tools, especially to extract language-specific base relations, like "includes" and "subclass of". Other tools may also request lists of objects meeting certain criteria, such as "all Java modules to be compiled before this one".
  3. The manufacture template, conceptually similar to a make input file (makefile), describes how to generate objects from other objects, and under what conditions to do so (such as: whenever there is a change in some source object on which the generated object depends).
  4. The object cache is the set of all objects generated under the direction of the manufacture template.
  5. The manufacture graph is an instantiation of the manufacture template, with placeholders in the template replaced by references to particular objects in the cache. A fully instantiated graph (one with no placeholders remaining) represents one complete version of a system.
  6. The manufacture engine interprets the template and adds generated objects to the cache, while updating the graph to represent the newly-generated objects.
  7. What distinguishes quilt from other make-like programs is the instantiator, which constructs the manufacture template from a database of user-definable relations among objects, and an abstract manufacture schema, which describes how to decide what steps are needed to produce certain kinds of output from available inputs (a cousin to the implicit rules used in make).
If one changes some source objects, the engine builds a new manufacture graph to represent the changed system. Objects from the old system might remain in the cache, depending on the engine's cache control policy, so that reverting to older versions of some source files might not require generating new objects, but merely changing the manufacture graph.

User Interface

References

  1. Ellen Borison, "A Model of Software Manufacture," in Reidar Conradi, Proceedings of the International Workshop on Advanced Programming Environments, pages 197-200, Springer-Verlag (1986). Lecture Notes In Computer Science 244.
  2. Andrew Hume. Mk: a successor to make Proceedings of the Summer 1987 USENIX Conference, pages 445-457, 1987
  3. David Alex Lamb, "Relations In Software Manufacture" Queen's University Department of Computing and Information Science Technical Report 1990-292 (postscript; 73.261 Kbytes)
  4. David Alex Lamb, "Abstraction Problems in Software Manufacture." Queen's University Department of Computing and Information Science Technical Report 1989-243 (postscript; 58.211 Kbytes) February 1989. 7 pages.
  5. Peter Miller. Recursive Make Considered Harmful