Older Papers

David Skillicorn - Older Papers

This page attempts to make my papers available on the web without violating copyrights. The papers here are either early versions of papers that have appeared, in which case a hint is given about where they did eventually appear, or drafts, about which I would appreciate comments.

They are divided by topic like this:

and are in reverse chronological order in each section.

BSP Design and Performance

Predictable Parallel Performance: The BSP Model

There are three big challenges for mainstream parallel computing: building useful hardware platforms, designing programming models that are effective, and designing a software construction process that builds correctness into software. The first has largely been solved, at least for current technology. The second has been an active area of research for perhaps fifteen years, while work on the third has barely begun. In this chapter, we describe the Bulk Synchronous Parallel (BSP) model which, at present, represents the best compromise among programming models for simplicity, predictability, and performance. We describe the model from the a software developer's perspective and show how its high-level structure is used to build efficient implementations. Almost alone among programming models, BSP has an associated cost model so that the performance of programs can be predicted on any target without laborious benchmarking. Some progress towards software construction has also been made in the context of BSP.

Postscript

BSP Clusters: High Performance, Reliable and Very Low Cost

with Stephen Donaldson and Jonathan Hill.

We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses global knowledge of a program's communication pattern to maximise the performance of the switch. The performance is comparable to previous low-latency protocols on similar hardware, but the addition of reliability means that this protocol can be directly used by application software. For a modest budget of $US20,000 it is possible to build a machine that outperforms an IBM SP2 on all the NAS benchmarks (BT +80%, SP +70%, MG +9%, and LU +65% improvement), and an SGI Origin 2000 on half (BT +10, SP -24%, MG +10%, and LU -28%). The protocol has a CPU overhead of 1.5 microsecs for packet download and 3.6 microsecs for upload. Small packets can be communicated through the switch in a pipelined fashion every 21 microsecs. Application-to-application one-way latency is 29 microsecs plus the latency of the switch. A raw link bandwidth of 93Mbps is achieved for 1400-byte packets, and 50M bps for 128-byte packets. This scales to eight processors communicating at 91Mbps per link, to give a sustained global bandwidth of 728Mbps.

Postscript (Oxford TR-5-98)

European sites may wish to download it from Oxford.

Comment: This paper presents some compelling performance data suggesting that models such as BSP gain an advantage from the implicit global data available to each thread. This knowledge makes some novel protocols possible. What it doesn't explicitly say is that this same argument suggests that LogP is at a performance disadvantage with respect to BSP as well.

Exploiting Structure for Performance

(Invited Talk at Europar '98, September 1998. Joint work with Jonathan Hill and Stephen Donaldson.)

Many popular parallel programming models are based on an independent threads executing on each processor, with some sort of coordination mechanism to handle communication. Each thread knows very little about what other threads are doing.

In contrast, in models that are based on machine-wide operations, such as skeletons and BSP, each thread has a great deal of explicit and implicit knowledge about what other threads are doing. This knowledge can be exploited to make better decisions about when and how to communicate which, in turn, result in improved performance.

We illustrate how global knowledge is used in the design of BSPlib, where the performance gains occur, and how large they are. So much improvement is possible that explicit extra communication might be worth using for models that would not otherwise share thread state.

Slides in postscript.

This talk is loosely based on the paper "BSP Clusters", although it is not nearly so technical.

Predictable Communication on Unpredictable Networks: Implementing BSP over TCP/IP and UDP/IP

with Stephen Donaldson and Jonathan Hill.

The BSP cost model measures the cost of communication using a single architectural parameter, g, which measures permeability of the network to continuous traffic. Architectures such as networks of workstations pose particular problems for high-performance communication because it is hard to achieve high communication throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective. We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Significant performance improvements can be achieved compared to unstructured communication over the same transport protocols as in the case of, for example, MPI.

Postscript

Static Scheduling using Task Replication for LogP and BSP Models

with Cristina Boeres and Vinod Rebello,

Keywords: latency, linear model, overhead model, multi-stage scheduling, MSA.

Postscript

A version appears in the proceedings of Europar'98.

Comment: BSP's requirement that programs use supersteps feels restrictive to programmers in some application domains (interestingly, many `traditional' scientific programmers find supersteps very natural). This paper explores how well an arbitrary task graph can be packed into supersteps, i.e. does doing so extend its makespan. It turns out that it does pretty well.

Optimising Data-Parallel Programs Using the BSP Cost Model

with M. Danelutto, S. Pelagatti, and A. Zavanella,

We describe the use of the BSP cost model to optimise programs using skeletons or data-parallel operations, in which program components may have multiple implementations. The use of BSP transforms the problem of finding the best implementation choice for each component that minimises overall execution time into a one-dimensional minimisation problem. An algorithm which finds optimal implementations in time linear in the length of the program is given.

Keywords: skeletons, P3L, optimization, optimisation, shortest path, cost models.

Postscript

A shorter version appears in the proceedings of Europar'98.

Comment: Many skeleton-based approaches face the problem of optimising compositions of skeletons when there are multiple choices for each one. Not much progress has been made because the data arrangements needed to interface between each pair of possible implementations get complicated. This paper explains how the BSP cost model simplifies the problem setting, and then gives a shortest-path algorithm that finds the optimal choices. It's not technically difficult, but it dramatically improves the state of the art.

Portability of Performance with the BSPLib Communications Library

with J.M.D. Hill and S. Donaldson.

The BSP cost model makes a new level of power available for designing parallel algorithms. First, it models the actual behaviour of today's parallel computers, and so can be used to choose appropriate algorithms without completely implementing them. Second, it becomes possible to characterise the range of architecture performance over which a particular algorithm is the best choice. This provides the foundation for developing software that is both portable at the source code level, and in its expectation of performance. We illustrate by comparing three possible implementations of broadcast, and show that a two-phase broadcast algorithm outperforms other techniques whenever the size of the data is large relative to the cost of synchronisation, and that broadcasting using trees is never a good technique (despite its continued popularity). We carry out a similar analysis for samplesort, and show that samplesort cannot perform well on networks of workstations unless the network bandwidth exceeds a certain threshold.

Postscript

This paper appears in the proceedings of Massively Parallel Programming Models '97, London, November 1997, IEEE CS Press.

Comment: This paper gives a nice example of the clarity that the BSP cost model provides, and how this pays off in design. Poor broadcast techniques are still widely used, however. After reading this paper you'll know better.

An Analytic Model for ATM Network Performance and its Application to BSP

with E. Karimi,

Bulk Synchronous Parallelism (BSP) as a parallel model enables accurate costs of parallel programs to be predicted from the program structure and two architectural parameters, $g$, the permeability of the network, and $l$, the time required for barrier synchronisation. Networks such as ATM already play a role in parallel computers built as networks of workstations, and may become the standard mechanism for interconnecting processors at all scales. We present an analytic model for determining the BSP parameters of such architectures. Although the model is simple, there is substantial agreement with measured results where these are known. This represents the first time that these architectural parameters have been determined other than by benchmarking, and suggests that the approach may be serviceable for other wormhole routed networks.

Keywords: parallel computing, interconnection network, performance modelling, total exchange, bulk synchronous parallelism, latency, throughput.

Postscript (Queen's TR97-414)

Comment: This paper is the first analytic determination of BSP parameters. The results obtained agree quite well with those measured by other people. It doesn't look as if ATM is likely to become important in parallel computers, but the modelling techniques used here may be adaptable to some other networks.

Predictable Communication on Unpredictable Networks: Implementing BSP over TCP/IP

The BSP cost model measures the cost of communication using a single architectural parameter, $g$, which measures permeability of the network to continuous traffic. Architectures, typically networks of workstations, pose particular problems for any high-performance communication because it is hard to achieve high throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective.

We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Performance improvements as large as a factor of two over MPI can be achieved.

Keywords: Bulk synchronous parallelism, network performance, throttling, MPI, CSMA/CD, Ethernet, slotting.

Postscript (Oxford TR-40-97)

European sites may wish to download it from Oxford. A shorter version appears in the proceedings of Europar'98.

Comment: It's interesting that even a clunky protocol like TCP/IP can be controlled well enough to match its low-level behaviour to a target network. This is another illustration of how BSP's view of communication as a bulk behaviour of programs pays performance dividends.

Communication Performance Optimisation Requires Minimising Variance

The cost of communication in message-passing systems can only be computed based on a large number of low-level details. Consequently, the only architectural measure they naturally suggest is a first order one, latency. We show that a second-order property, the standard deviation of the delivery times is also of interest. Most importantly, the average performance of a large communication system depends not only on the average performance of its components, but also on the standard deviation of these performances. In other words, building a high-performance system requires components that are themselves high-performance, but their performance must also have small variance. We illustrate this effect using distributions of the BSP g parameter. Lower bounds on the communication performance of large systems can be derived from data measured over single links.

Keywords: latency, network performance, Bulk synchronous parallelism, BSP.

Postscript (Oxford TR-39-97)

European sites may wish to download it from Oxford.

A shorter version of this paper appears in the Proceedings of High Performance Computing and Networks (HPCN'98) and was reprinted in Future Generation Computer Systems 1999.

Comment: This paper illustrates the simple, but little appreciated result that having large variance on the delivery times of e.g. individual links in a network affects not only the variance of the overall system behaviour but also its average. Thus you can tell how bad a large system is going to be just by knowing how its pieces perform. The bottom line is that manufacturers need to pay as much attention to reducing variance as to reducing averages. They don't at the moment.

Stability of communication performance in practice: from the Cray T3E to Networks of Workstations

The Bulk Synchronous Parallel model costs programs using three parameters, the processor speed (s), the network permeability (g), and the superstep overhead (l). This simple model is accurate over a wide variety of applications and parallel computers. However, all real parallel computers exhibit behaviour that is not captured by the cost model.

This paper is an extensive study of the accuracy and stability of g for a wide range of parallel computers. A series of parameterised kernel benchmarks are applied to each communication system, and the resulting values of g tabulated. The resulting values of g are normally distributed with small standard deviation across all of the architectures investigated.

There will always be some variation between a computer's behaviour in reality and that predicted by the cost model. We extend the cost model with a confidence parameter that describes the standard deviation in g as a function of applied traffic pattern. This can be regarded as providing parameterised kernel benchmarks for each architecture. Programmers can use this information to decide how much discrepancy they are likely to encounter when using the cost model in situations where this is critically important. In general, the results show that the BSP g parameter does accurately predict communication performance for real parallel computers.

Keywords: network performance, performance modelling, Bulk synchronous parallelism, BSP, Silicon Graphics Power Challenge, Silicon Graphics Origin 2000, IBM SP2, Cray T3D, Cray T3E, Intel Pentium Pro, Ethernet, cost modelling.

Postscript (Oxford TR-33-97)

European sites may wish to download it from Oxford

Comment: Do BSP's ridiculously simple architectural parameters really reflect how machines behave? This paper answers that question by exhaustively analysing performance under a benchmark that applies a wide mixture of communication patterns.

Lessons Learned from Implementing BSP

with Jonathan M.D. Hill.

Abstract: We focus on two criticisms of Bulk Synchronous Parallelism (BSP): that delaying communication until specific points in a program causes poor performance, and that frequent barrier synchronisations are too expensive for high-performance parallel computing. We show that these criticisms are misguided, not just about BSP but about parallel programming in general, because they are based on misconceptions about the origins of poor performance. The main implication for parallel programming is that higher levels of abstraction do not only make software construction easier--- they also make high-performance implementation easier.

Keywords: BSP, barrier synchronisation, barrier synchronization, total exchange, broadcast, performance modelling

Postscript (Oxford TR-21-96)

This paper appears as Oxford University Computing Laboratory Technical Report 96-21. European sites may wish to download it from Oxford. A version appears in the proceedings of HPCN'97, Springer LNCS1225, 762-771 and is reprinted in the March 1998 issue of Future Generation Computer Systems.

Practical Barrier Synchronisation

with Jonathan M.D. Hill.

Abstract: We investigate the performance of barrier synchronisation on both shared-memory and distributed-memory architectures, using a wide range of techniques. The performance results obtained show that distributed-memory architectures behave predictably, although their performance for barrier synchronisation is relatively poor. For shared-memory architectures, a much larger range of implementation techniques are available. We show that asymptotic analysis is useless, and a detailed understanding of the underlying hardware is required to design an effective barrier implementation. We show that a technique using cache coherence is more effective than semaphore- or lock-based techniques, and is competitive with specialised barrier synchronisation hardware.

Keywords: barrier synchronisation, barrier synchronization, lock, semaphore, cache coherence, total exchange, hardware barriers, collective communication.

Postscript (Oxford TR-16-96)

This paper appears as Oxford University Computing Laboratory Technical Report 96-16. European sites may wish to download it from Oxford. A short version appears in the proceedings of Euromicro'98, Barcelona, January 1998.

Comment: Barriers have the reputation of being inherently expensive. This prejudice is ably supported by manufacturers who seem unable to design barriers that perform well (this is often because simple OS activities are horribly expensive -- local locks taking hundreds of microseconds are common). In this paper we show how to implement barriers right. For shared memory architectures, this often means tens of microseconds. For distributed memory architectures, things are worse, largely because you can't get enough control of the machine (so it does a context switch you can't avoid, for example).

Since this paper was written, things have changed with the introduction of network interface cards mapped into user process memory. Expect some new results.

miniBSP: A BSP Language and Transformation Systems

Abstract: We define a small BSP-based language, that is simpler and more flexible than standard BSP. Its composition rules allow supersteps to be composed sequentially and in parallel, so it is a subset-synchronising language. We give a cost-neutral transformation of BSP to miniBSP, showing how it can be used to compile BSP for clustered architectures. miniBSP also show that BSP can be regarded as one end of a spectrum of languages whose other end is dataflow.

Keywords: Bulk synchronous parallelism, program transformation, cluster computing, dataflow.

Postscript

Comment: This paper argues that the cost of a barrier is modelled fairly well by pg (i.e. a total exchange). This is indeed true for almost all architectures in use today. With this simplification, some nice BSP transformation rules exist. I've never quite been able to decide what the point of the paper is, however, which is why it's never appeared anywhere.

Multiprogramming BSP Programs

Abstract: We explore the problem of transforming a BSP program for execution on a multiprogramming architecture, where it has to share resources with other BSP programs executing at the same time.

Keywords: BSP, program transformation, cost measures.

Postscript

Comment: One of the nicest properties of BSP is that two programs running on the same platform should not, according to the model, interfere with each other's communication. This isn't quite true for architectures like the CRAY T3D and T3E, but it's not a bad assumption for most other architectures.

This makes it possible to work out how best to allocate a program to a compute server so that it is in balance, i.e. so that its communication needs match its compute needs.

Building BSP Programs Using the Refinement Calculus

Abstract: We extend the Refinement Calculus to permit the derivation of programs in the Bulk Synchronous Parallelism (BSP) style. This provides a mechanism for constructing correct programs in this portable and efficient style.

Keywords: parallel programming, parallel software development, program transformation, refinement, structure-directed refinement, refinement calculus, weakest precondition, bulk synchronous parallelism.

Postscript (Queen's TR96-400)

A version of this paper appears in the Formal Methods for Parallel Programming and Applications workshop at IPPS/SPDP'98.

Comment: One of the reasons why formal methods are not popular for sequential program development is that each tiny statement in a program is produced by the application of a (possibly complex) law. Things are different in parallel programming, for the application of a single law can produce the entire top-level structure of a program. There seems to be to be much more hope for cost-effective applications of formal approaches in the parallel setting.

This paper illustrates, using a small example. The big win will come when someone figures out how (or whether) data refinement can be applied to automatically generate data distributions (and the matching control distributions).

Questions and Answers about BSP

with Jonathan M.D. Hill and W.F. McColl.

Abstract: Bulk Synchronous Parallelism (BSP) is a parallel programming model that abstracts from low-level program structures in favour of supersteps. A superstep consists of a set of independent local computations, followed by a global communication phase and a barrier synchronisation. Structuring programs in this way enables their costs to be accurately determined from a few simple architectural parameters, namely the permeability of the communication network to uniformly-random traffic and the time to synchronise. Although permutation routing and barrier synchronisations are widely regarded as inherently expensive, this is not the case. As a result, the structure imposed by BSP comes for free in performance terms, while bringing considerable benefits from an application-building perspective. This paper answers the most common questions we are asked about BSP and justifies its claim to be a major step forward in parallel programming.

Keywords: parallel programming model, bulk synchronous parallelism, barrier synchronisation, cost modelling, superstep, locality, BSPlib, SGI Powerchallenge, Cray T3D, IBM SP2, Hitachi SR2001, Convex Exemplar, Digital Alpha Farm, Parsytec GC, MPI.

Postscript (Oxford TR-15-96, revised to final form 11th November 1996.)

This paper appears as Oxford University Computing Laboratory Technical Report 96-15. European sites may wish to download it from the BSP Worldwide Page.

An improved version of this paper appears in Scientific Programming, Vol. 6, No. 3, 1997, 249-274..

Comment: This is the definitive introduction to BSP.

Parallel Programming Models

Motivating Computational Grids

We examine plausible motivations for both using and building computational grids. We find two reasons to use such grids: the existence of a workload in which tasks have deadlines, but the load varies over time; and the existence of an upper limit on cost-effective parallel systems, forcing replication when greater degrees of parallelism are required. We speculate that there may be scope for public grids, in which protecting the integrity of information is not guaranteed, but that there is much larger potential for virtual private grids within organizations. In both cases, the form of markets, execution planning, and pricing is likely to be different from the frictionless markets predicted in the literature.

Postscript (Queen's TR2001-450)

A version of this paper appears in the Workshop on Global and Peer-to-Peer Computing on Large Scale Distributed Systems at CCGrid 2002.

The Spider Model of Agents

with Freeman Huang

We take the position that large-scale distributed systems are better understood, at all levels, when locality is taken into account. When communication and mobility are clearly separated, it is easier to design, understand, and implement goal-directed agent programs. We present the Spider model of agents to validate our position. Systems contain two kinds of entities: spiders which represent service providers, and arms, which represent goal-directed agents. Communication, however, takes place only between an arm and the spider at which it is currently located. We present both a formal description of the model using the ambient calculus, and a Java-based implementation.

Postscript (Queen's TR2001-447)

(A shorter version appears in the proceedings of Third International Workshop on Mobile Agents for Telecommunication Applications, Montreal, August 2001.)

Coordinating Programs in the Network of Tasks Model

with Susanna Pelagatti

The Network of Tasks (NOT) model allows adaptive node programs written in a variety of parallel languages to be connected together in an almost acyclic task graph. The main difference between NOT and other task graphs is that it is designed to make the performance of the graph predictable from knowledge of the performance of the component node programs and the visible structure of the graph. It can therefore be regarded as a coordination language that is transparent about performance. For large-scale computations that are distributed to geographically-distributed compute servers, the NOT model helps programmers to plan, assemble, schedule, and distribute their problems.

Postscript

Submitted to a journal.

A New Framework for Software Development

Conventional software development does not allow proper design choices to be made when abstract machine interfaces are involved. The best design at one level of abstraction does not necessarily lead to the best design at the next, and subsequent levels. This problem has not been obtrusive in imperative, sequential programming, but becomes much more significant for targets such as superscalar processors, FPGAs, and parallel computers.

We suggest a perspective for software development where the goal of each phase is to produce all potential implementations to be passed to the next phase. We explore the implications of this simple idea, expressing the process of software design as a sequence of traversals of transformation spaces between two abstract machines.

Postscript (Queen's TR1999-432)

The Network of Tasks Model

We present a new model, the Networks of Tasks (NOT) model, that allows modules from skeleton-like languages to be embedded in static task graphs. The model is designed to provide transparent cost information, so that program designers can accurately predict the execution time performance of their programs as they assemble them. This is done by using an implementation technique called work-based allocation which uses adaptivity of the component node programs to execute the task graph with the same work and communication cost that is visible when the task graph is assembled. The semantics of NOT programs is simple enough that formal methods for developing them are straightforward. A refinement-based calculus for the NOT model is also outlined, and a law for handling residuals is given.

Postscript (Queen's TR1999-427)

Slides of a talk at a Dagstuhl workshop

Architectures, Costs, and Transformations

(Invited Talk at CMPP, Sweden, June 1998).

Parallel software construction is moving beyond an early emphasis on runtime performance as the only goal, towards a more mature view in which other facets of design become as important. However, there are many difficulties that prevent the construction of a complete parallel software development methodology. These include: the variety of possible targets, and the portability issues that this raises; the difficult of determining costs of execution, and representing them in ways that are simple enough to be useful; and our lack of knowledge about how to integrate transformations into development, which prevents a calculational style from being effective. All of this is further complicated by the fact that architectures themselves are designed pragmatically, and do not necessarily lend themselves to simple, elegant descriptions.

Postscript

Slides of the talk

Comment: At the time I thought that, as a community, we hadn't gone very far attacking the problems of formal, cost-based, parallel software design. However, these problems are extremely difficulty, so I don't think we've done too badly. I'm cautiously optimistic about a cost model that is confluent with respect to some useful set of program constructors, i.e. makes it possible to break programs into reasonably sized pieces, inside which costs are well-behaved.

Models and languages for parallel computation

with Domenico Talia.

Abstract: We survey many parallel programming models and languages. We use six criteria to assess their suitability for realistic portable parallel programming, arguing that an ideal model should be easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should be efficiently implementable, and should provide accurate information about the cost of programs. We consider programming models in six categories, depending on the level of abstraction they provide. Those that are very abstract conceal even the use of parallelism from the software. They make software easy to build and port, but efficiency is usually hard to achieve. At the other end of the spectrum, low-level models make all of the messy issues of parallel programming explicit (how many threads, how to place them, how to express communication, and how to schedule communication), so that software is hard to build and not very portable, but is usually efficient. Most recent models are near the center of this spectrum, exploring the best trade-offs between expressiveness and efficiency. However, there are models that are both abstract and able to be implemented efficiently, opening the prospect of parallelism as part of the mainstream of computing, rather than a high-performance backwater.

ACM Classification: D.1 Programming Techniques, C.4 Performance of Systems, D.3.2 Language Classifications.

General Terms: Languages, Performance, Theory.

Other Keywords: Parallel programming models, parallel programming languages, general-purpose parallel computation, taxonomy, software development methods, object-oriented languages, logic programming languages.

Postscript (Revised October 1996.)

An improved version of this paper appears in Computing Surveys, June 1998.

Comment: We tried to cover all of the interesting models that were around in 1996-97. The presentation depends very strongly on our view of parallel computation and its future, which is not the view often espoused by current numerical/scientific programmers. To that extent, it's quite idiosyncratic.

Hypermedia and Applications

A New Conceptual Model for Large-Scale Hypermedia

Existing interfaces to large-scale hypermedia such as the world wide web have poor conceptual models and poor rendering of navigational and contextual information. New technologies that make it cheaper to use three-dimensional representations suggest the use of richer conceptual models. We discuss criteria for assessing more powerful conceptual models and design decisions that have to be made to exploit richer interfaces. The Treeworld model is suggested as one attractive example of such a model.

Keywords: conceptual model, navigation, world wide web, large-scale hypermedia, 3-d glasses, search, relevance structuring, hierarchy, focus + context, visualisation, teleportation, Treeworld.

Postscript, html (Queen's TR1999-430)

A version of this paper appears in JUCS.

Dynamic Generation of Customised Documents

with Haitian Yan and Xinyao Yu,

A description of a two-phase filtering technique that can be used to generate different versions of a document dynamically.

Keywords: Hyperwave, Microcosm, hypermedia courseware, automatic generation of tags, filter.

Paper

A short version of this paper appears in the proceedings of Ed-Media'98.

Comment: This paper illustrates the power of interposing filters between a web browser and servers. This allows every document to be processed in any desired way. Examples such as automatic glossary generation, and user-selected content customisation are illustrated. Today you might do this using XML, but there are still attractive aspects to the amount of control the reader gets using filters.

Non-intrusive Lightweight Agents for Information Management

with Sumit Varma.

An information management system should reduce the workload of the user. It should also learn about user's preferences and reduce the effort in managing information. We propose a non-intrusive approach to building information management systems, based on software agents. Software agents are programs that carry out actions on behalf of the user autonomously. We describe the issues involved in designing such systems. As an example, we design BMA (a Bookmark Management Agent). BMA is a non-intrusive, lightweight software agent that manages bookmarks (web browser pointers to favourite sites).

Tests show that BMA is a useful tool in managing bookmarks. This shows that software agents are a promising approach to information management systems. It also demonstrates the feasibility of a non-intrusive agent.

Postscript (TR97-411) (big: 2.5M)

Comment: Agents that run locally and interact with users only via the file system (e.g. by creating html documents) are non-intrusive and unannoying. This paper describes one such agent.

Using Collaborative Hypermedia to Replace Lectures in University Teaching.

We describe our experiences teaching a first-year programming course using the Hyperwave hypermedia system completely without lectures.

Keywords: Hyperwave, hypermedia, courseware, first-year teaching.

Paper

Poster: ACH-ALLC'97 Conference.

Comment: A short overview of our experiences using technology for first-year programming courses.

Practical Replacement of Lectures by Hypermedia Courseware

This paper is a preliminary report of our experiences offering CISC104, a first-year introductory computing course, using hypermedia courseware to completely replace lectures.

Keywords: Hyperwave, technology for learning, distance education.

Paper

A version of this paper appears in the proceedings of Ed-Media'98.

Slides of the talk.

Using Distributed Hypermedia for Collaborative Learning in Universities

Hypermedia technology provides both an opportunity for universities to provide a better learning experience for their students, and a way to cope with funding reductions. Second-generation hypermedia systems makes it cost-effective to develop and deliver multimedia courseware, while permitting learning to occur within a community. We illustrate by describing our experiences developing and offering hypermedia courses in the Department of Computing and Information Science at Queen's University. These include an advanced course in computer architecture, and a first year programming and applications course, in which lectures were replaced by on-line courseware, using the Hyperwave (Hyper-G) system.

Paper

Teaching Computer Science Using Hypermedia

A second-generation hypermedia system, Hyper-G, was used to present a senior undergraduate course in advanced computer architecture. This paper reports on experiences with course presentation using this technology.

Postscript (TR95-386)

Parallelism and Structured Text

Structured Parallel Computation in Structured Documents

Document archives contain large amounts of data to which sophisticated queries are applied. The size of archives and the complexity of evaluating queries makes the use of parallelism attractive. The use of semantically-based markup such as SGML makes it possible to represent documents and document archives as data types.

We present a theory of trees and tree homomorphisms, modelling structured text archives and operations on them, from which it can be seen that:

many apparently-unrelated tree operations are homomorphisms;
homomorphisms can be described in a simple parameterised way that gives standard sequential and parallel implementations for them;
special classes of homomorphisms have parallel implementations of practical interest. In particular, we develop an implementation for path expression search, a novel powerful query facility for structured text, that takes time logarithmic in the text size.

Keywords: structured text, categorical data type, software development methodology, parallel algorithms, query evaluation.

Postscript (Queen's TR95-379). A final version of this paper appears in J. Universal Computer Science, Vol.3, No.1, January 1997.

Comment: This paper shows how the homomorphic skeleton approach can be used to design new algorithms for tree problems. Structured documents provide a natural application domain for such algorithms. This paper lays out a collection of algorithms that make sense for structured text, shows how they can be expressed as homomorphisms, and hence how they can be computed efficiently.

A Generalisation of Indexing for Parallel Document Search

Parallelism is useful in the storage and access of structured documents. Fast parallel algorithms for search in structured text are already known, but they will not supplant the use of indexes to speed up searching until massively parallel architectures become routinely available. However, these algorithms suggest new kinds of indexes that provide powerful search capability and performance even on modestly-parallel computers.

We present a generalisation of indexes based on regular languages, called indexing languages, chosen to be homomorphic images of languages generated by typical search patterns. Precomputing properties of text strings relative to indexing languages makes it fast to exclude large parts of the text from consideration before executing a direct search.

Postscript (Queen's TR95-383)

Comment: This paper shows how to mix structured text search with a limited kind of indexing that is not unlike signatures.

Parallel Implementation of Tree Skeletons

Trees are a useful data type, but they are not routinely included in parallel programming systems because their irregular structure makes them seem hard to compute with efficiently. We present a method for constructing implementations of skeletons, high-level homomorphic operations on trees, that execute in parallel. In particular, we consider the case where the size of the tree is much larger than the the number of processors available, so that tree data must be partitioned. The approach uses the theory of categorical data types to derive implementation templates based on tree contraction. Many useful tree operations can be computed in time logarithmic in the size of their argument, on a wide range of parallel systems.

Postscript (Queen's TR95-380)

A much improved version of this paper appears in the Journal of Parallel and Distributed Computing, Vol.39, 1996, 115--125. If at all possible, you should read that version, rather than this.

Comment: The key contribution of this paper is that homomorphisms on trees can be decomposed into homomorphisms on trees of trees. This decomposition is not particularly obvious. Having it means that techniques, such as tree contraction, that have logarithmic running times, can be extended to trees of trees.

A Parallel Tree Difference Algorithm

We present a tree difference algorithm with expected sequential execution time $O(n \log \log n)$ and expected parallel execution time of $O(\log n)$, for trees of size $n$. The algorithm assumes unique labels and permits operations only on leaves and frontier subtrees. Despite these limitations, it can be useful in the analysis of structured text.

Postscript (Queen's TR95-381)

An improved version of this paper appears in Information Processing Letters, Vol.60, No.5, 1996, 231--235.

Comment: A nice fast algorithm based on universal hashing.

Back to David Skillicorn's home page