How Much Testing Should be Taught?

Terry Shepard

Department of Electrical and Computer Engineering

Royal Military College of Canada

Kingston, Ontario, K7K 5L0, Canada

shepard@rmc.ca

Margaret Lamb

Queen's University

Kingston, Ontario, K7L 3N6, Canada

malamb@qucis.queensu.ca

Abstract The practice of software engineering places a heavy emphasis on testing, and yet only the bare fundamentals of testing are typically taught. Even more serious, standard practices such as inspection are hardly taught at all. Students need to understand the uses of inspection in order to understand when and how to use testing. We describe an undergraduate software specialization in a computer engineering curriculum and a graduate course on software verification and validation (V&V). The description is from the point of view of the teaching of testing, and the context that needs to be established in order to teach testing. The choice of material is difficult, and it is also difficult to find space in either the undergraduate curriculum or in the single graduate course on V&V techniques to give a satisfactory presentation of the important issues in testing. Making the curriculum issue even more awkward is our belief that a full course on testing and issues related to testing should be available (and mandatory for software engineers) at the undergraduate level. As well, there are those who would argue that the subject matter should be broadened even further, to include topics such as the analysis of software, for purposes such as measuring various aspects of quality, or determining what the impact of proposed changes will be. We hope this paper will provoke a reaction from software academics and professionals concerning what they think should be taught at the undergraduate and graduate levels, what industrial training should be available, and what the relation between university education and industrial training in this area should be.

1 Introduction

Many authors (e.g. [1]) use testing to mean any activity concerned with verifying expected software behavior. For example, inspection ([2],[3]) is often considered to be a testing activity. In this paper, testing will also be used more narrowly to mean the execution of computer programs, designs or specifications for the purpose of obtaining outputs whose values are then compared to the expected values of those outputs. The intended meaning should be clear from the context. Testing in the narrower sense does involve inspection and other verification activities. For example, test plans and test cases should be subject to inspection and reviews. Debugging may also be considered as part of testing, but we view it as essentially a process of code walkthrough or review, often aided by tools. Certainly all of these activities are related to testing, but using testing only in its broader sense the awkward property that there is then no word whose meaning implies execution by a computer.

Testing, inspection and other V&V techniques are part of the larger discipline of software engineering. Software engineering is an emerging discipline, first identified about 30 years ago. [4], but generally not given the research attention from the academic community that it deserves. As a result, interest in teaching software engineering at the undergraduate level still has a lot of room for growth. This is particularly true in the teaching of V&V. Within the teaching of V&V, the questions of how to approach the teaching of testing and how much time should be devoted to it are among the more difficult points. For example, there is almost no guidance provided in the 1991 ACM/IEEE Computing Curriculum [5]. It sets a minimum of only 8 lecture hours to cover the whole of V&V, including elementary proofs of correctness, reviews, and testing. It also allows for the possibility of elective courses, but makes no attempt to define what such courses should contain. More recent curriculum proposals in software engineering do include more material in this area, but it will be several years before there are significant numbers of students in such programs, so there is still room for improvement of existing curricula.

There is a complex interplay among a number of factors in the teaching of testing. Students need to gain an appreciation for more than just testing strategies and techniques. It is essential that they learn the place of testing in software development, both in theory and in practice. The role that testing plays in many development organizations today is still much greater than it should be. Too often testing is used as the primary way to find errors. To ensure correctness, a software process needs to include rigorous specifications and formal inspection of the specifications and code. One large study [3] shows that code inspections can be up to four times more efficient at finding errors than testing. It is interesting to note that this study in part reports on the introduction of inspection in one large company, 15 years after the original paper on inspection [2]. On the other hand, in many cases, correctness is less important than time to market and flexibility to adapt to changing demands, which tends to increase the relative importance of testing.

In teaching V&V, those techniques that are most cost-effective should be emphasized. Testing is the V&V technique of last resort, primarily because it only finds failures. It must be followed by debugging to find the root causes of the failures. Debugging is generally very costly, and is the main contributor to the low cost-effectiveness of testing as a V&V technique. On the other hand, testing and debugging are intrinsically more complex than inspection, which considers the product directly, and so has less need for additional methods and tools. Its tools consist of checklists, and a few forms to be filled in and analyzed. Testing and debugging offer more opportunity for teaching, while the basics of inspection take little time to teach, and the rest must be learned through experience. Formal techniques for V&V are also important, but teaching of testing needs greater emphasis, because it is ubiquitous and often badly handled in practice. Arguments about the relative costs of testing vs. the use of formal techniques are difficult. Certainly proving programs correct is expensive, but there are many less expensive ways to use formal techniques in V&V. Too little evidence is available about their cost-effectiveness in practice. For example, it is not unusual for over 50% of the total cost of a project to be taken up in testing and debugging, but projects that spend anywhere near 50% of their budgets on formal techniques are rare. Projects that spend over 50% of their budgets on testing and debugging are of course not making cost-effective use of available resources, but since there are many such projects, the need to understand testing and debugging is increased.

Testing needs to be set in a proper context, especially for undergraduates, as an essential part of software development. No matter how carefully other V&V techniques are applied, there will still be residual faults that will be discovered only at the testing stage. Testing also has a legitimate role to play in determining when software is ready for release. Someday, there should be widespread use of testing to estimate the reliability of the tested product [6]. It is therefore important, in teaching testing, to provide an appreciation for the wide range of activities that are part of testing, and even to make students enthusiastic about working as testers. We also need to prepare them for the "real world", in which testing is often the primary V&V technique, and is subject to severe budget/resource/schedule crunch problems.

It is also important to make students realize that there are special circumstances in which a level of reliability is needed which is beyond that which can be assured or measured by normal inspection and testing. One example is the development of safety-critical software [7].

The remainder of this paper consists of an overview of how V&V is taught at the Royal Military College of Canada (RMC) in Kingston, Ontario, argues for improvements that need to be made, and draws some conclusions about what can and should be taught about testing at the undergraduate and graduate levels.

2 Testing in an Undergraduate Software Engineering Specialization

The discussion of teaching testing at an undergraduate level is complicated by the fact that there are several different types of software-related undergraduate programs, all with differing goals and priorities. For example, universities may offer programs in Computer Science, Computer Engineering, Software Engineering, Information Science, etc. Each will emphasize different aspects of software, and will have different amounts of time available to devote to testing and other software engineering issues. In addition, different schools will use different computer languages and systems for training, which will influence what tools and resources will be available to the students for practicing testing. No one example can be used as a model for all undergraduate programs, but it is our intent that a description of one specific program will provide ideas and a starting place for discussion.

Our specific example is the software specialization within computer engineering as taught at RMC since 1992 [8]. Engineering students at RMC take a common core of engineering courses in their first two years, including some programming. In the third and fourth years, the computer engineering program has its own curriculum in which students choose either a hardware or software specialization.

The norm for software engineering in many computer curricula is a single overview course (eg. [5]), plus several other courses such as structured programming, algorithms and data structures, computer architecture, operating systems, and data base design. In a computer engineering curriculum, a course on real-time software design is often included. These courses are generally concerned with making things work at low levels of detail, so the problems of scale are not dealt with. Programming assignments of a few hundred lines are inadequate in and of themselves for the purpose of conveying techniques and approaches for dealing with millions of lines of code, or for working on a small part of such large projects. The computer engineering program at RMC emphasizes issues of scale by including them in every software-related course, and by spreading software engineering over about 8 courses.

The following paragraphs give a brief summary to highlight V&V issues covered in four of the courses given at RMC.

In Computer Program Design, which is the first in-depth programming course, students are exposed to the concepts of top-down vs. bottom-up testing, with stubs or test drivers. They learn to use a debugger, which is helpful at this stage as a teaching tool, since it makes it easier to visualize what is happening when their programs execute. There is no discussion of inspection or other review techniques.

In Software Engineering I, (which is based on [9]), students are taught specific techniques for different stages of the software development process. One of the main threads in the course is the importance of precise module interface specifications. In relation to testing, this means that unit testing can be done with reasonable confidence that few corrections will be needed, and hence little or no regression testing at the unit level. The students are taught the concepts of functional and structural testing, and the use of coverage measures to assess the adequacy of testing. They have the opportunity to practice module and integration testing in an assignment (see §4). Inspections and reviews are emphasized as precursors to testing, but there is little time to learn and practice them, and in any case, the students lack sufficient programming maturity to be effective inspectors. The students are shown part of a training video about inspections. Pedagogically, it would be better to spend more time actually doing inspections, on the "Do what I do, not what I say" principle, but this is difficult to do well.

In Real-Time Embedded System Design, students use Real-Time Object Oriented Modeling (ROOM). ROOM, supported by the ObjecTime toolset (http://www.objectime.com) embodies an approach to real-time system design based on concurrent objects whose behavior is described by a variant of Statecharts. It introduces the students to a new form of verification, namely visualization of execution at the design level. This technique is a blend of testing and inspection, in the sense that execution of a design model is driven by scenarios, represented by sets of test inputs. While we are not aware of any formal studies to confirm the value of this technique, intuition suggests that it should be very powerful. Another V&V problem dealt with in this course is the specification and verification of timing behavior. Only a small portion of the course is available to deal with this issue, but schedulability analysis and rate monotonic analysis are covered. We emphasize that verification of timing and performance issues must not be left to the testing stage alone.

All Software Engineering students take a full-year Design Project course, in which they design and construct a prototype system, often including several thousand lines of code, to satisfy selected criteria against which actual performance is evaluated. One of the requirements is the preparation of an acceptance test plan, which is then used at the end of the project to test the system as built. While reviews are a part of the development process, the students typically underestimate the value of effort invested in the early stages of development, and so the reviews are less valuable during the project than they might be, and by the end of the project, the students have a better understanding of why reviews are important. One of the other lessons learned is how much time can disappear into testing and debugging.

In other courses, students are introduced to such topics as operating system design; object-oriented analysis, design, and programming; graphics; and database management. They test their programs, but there is no special emphasis on teaching them how to do so.

In summary, students in this program learn more about testing concepts and techniques than is usual in most other programs. They also have an opportunity to practice them. On the other hand, they learn very little about management of test suites, languages for describing and controlling the execution of test cases, strategies for selecting test cases, regression testing techniques, logging and analysis of test results, system testing issues (including performance, load, stress testing), automatic test generation, software reliability analysis, or commercial testing tools.

3 Testing in a Graduate Course on V&V

In the winter term of 92/93, a graduate course on software V&V was introduced at RMC [10]. To the best of our knowledge, this was the first such course in Canada. The course has now been given a total of six times. The calendar description of the course is as follows:

Formal techniques: proving programs correct, checking consistency and completeness. Inspections and reviews. Unit/module testing. White box and black box testing. System integration and testing. Tool support for testing. Faults vs. failures. Verification of implementation against both requirements and design. Techniques for critical software. Trustworthiness vs. reliability. Timing analysis and verification. Safety analysis. Multi-version programming. Software quality assurance, software reliability. Debugging.

The first four offerings of the course placed a heavy emphasis on formal methods, justified on the basis that the military has a large amount of safety-critical and secure software, which requires this level of verification. The shift to a greater emphasis on testing was made in 1997 on the basis that this course is essentially a foundation course for V&V methods, that testing practices need to be robust for all kinds of software, and that the special needs of formal verification should be handled in more specialized courses.

The actual order and emphasis of the topics in the course as taught in the winter term of 1998 is given below. The class normally meets three hours per week for 14 weeks, including the exam period (references to terms not defined elsewhere in this paper can be found at www.qucis.queensu.ca):

Week 1: Presentation and discussion of course outline

Week 2: Object-oriented unit testing

Week 3: Software quality overview. The V&V Process. IEEE Standard 1012-1986, revision.

Week 4: Inspection and Reviews.

Week 5: Overview of Testing.

Week 6: Test Automation. Test tools. Debugging.

Week 7: Languages for describing tests. TTCN.

Week 8: Strategies for selecting test cases. Automatic generation of tests.

Week 9: Test Coverage. Test Adequacy. Testing theory.

Week 10: Object-oriented testing issues. Ideas for a testing framework for ObjecTime. Regression testing.

Week 11: When to stop testing. Software reliability. Safety and Trustworthiness.

Week 12: Assertion based verification. Cleanroom.

Week 13: Formal verification of software work products. Eves. Requirements and Design based checking. Measuring Design Quality. Model Checking.

Weeks 14 to 16: Student presentations;

The class meets for 1.5 hours per week, plus additional guest lectures, which are typically 3 hours. In the past, these have included speakers relating real-world V&V experiences, and presentations of testing tools.

The course includes three major assignments. The first covers module and integration testing as well as inspection (see §4). In the second, students are asked to experiment with and evaluate a commercial testing tool, whose capabilities include automatic test generation, several levels of test coverage measurement, management of test execution, and version and configuration control of test suites. The third is a research paper. Topics have included inspection, use of static metrics to predict residual defects, formal verification, orthogonal defect classification, and object-oriented testing.

In designing such a course, there are compromises to be made between the ideal order in which to present the topics and practical considerations dealing with assignments. Ideally, the overview presentation of software quality assurance would be given right at the start of the course, to set the context for V&V. The sequencing of the remainder of the course would then be determined by the relative cost effectiveness of the three main techniques available: inspection first, then testing, and finally formal verification, to be used only if needed. In practice, it has been necessary to teach enough about testing at the start of the course to get the students started immediately with their first testing assignment.

Even at the graduate level, most students initially believe that testing is the only practical way to really find out if software is correct. They also tend to believe testing should be the primary way to find errors. There is a paradox here, since the students do appreciate that testing can never be used to demonstrate correctness completely, and they know how difficult debugging is. Students tend to have a bias against the use of inspections as a cost-effective engineering technique. In this respect, they share the views still held by many practicing software developers.

Inspection can include near formality, as in the use of proof sketches during inspection to verify intended functional behavior in the IBM Cleanroom approach (http://www.clearlake.ibm.com/MFG/solutions/cleanrm.html.) Evidence to show that the near formality is cost-effective compared to more traditional inspections is lacking, but the intuitive appeal is clear, and presenting and discussing it helps to clarify the use and applicability of formal techniques. It also helps to make clear the need for simple and readable notations to unambiguously express intended functionality. Studying Cleanroom is also useful because of its emphasis on the ideal role of testing, namely that it should emphasize reliability estimation. In the ideal course sequence, the ideas implicit in Cleanroom would be covered earlier, providing a better context for testing as it should be.

4 Testing Assignments

Both the undergraduate Software Engineering I course and the graduate V&V course include assignments in which the students practice testing. In earlier versions of the courses, only unit testing was done, in part because of the practical limitations of testing larger systems as part of a course. Recently one assignment in both courses has included integration testing of a small system. To make this practical, students modify an existing system rather than implementing one from scratch. The modified system is smaller than typical commercial systems, but it exposes the students to some of the problems of integration testing.

Unit testing is taught the same way at both the undergraduate and graduate levels. It is done on an object-oriented basis: a unit is a class or object. Unit testing in the course assignments is done on a black box basis, while testing adequacy is measured by a structural coverage measure. Testing objects is natural; objects have state and mostly self-contained behavior. The approach used is one developed by Hoffman, and uses a simple C-based scripting language and a supporting tool [9]. Following Parnas, Hoffman uses the term module rather than class or object, and talks about module access programs rather than methods or member functions. One of the important aspects of Hoffman's work is that module behavior is described abstractly, so test plans can be written prior to module implementation, and test scripts that implement the test plans can be written without knowing how the module is implemented. Test cases are based on driving modules into particular states, and then using access programs to examine output values that can be inferred from the expected state. Modules are also capable of signaling exceptions, and this behavior is tested as well. This approach is different from most unit testing in current practice. Conceptually, it is very important in an educational context, as it illustrates several important ideas about testing, such as white box/black box, the value of specifications in deriving test cases, the need for careful design of test plans, the importance of automating the repetitive parts of the testing process, the difficulty of testing some exceptions, and the use of specifications to derive expected test outputs.

The assignments also include code inspection. Students are given code that has been seeded with bugs. They are asked to inspect it first and find as many bugs as possible before they start testing. Some of the seeded errors cannot be found with testing. Others, for example those involving pointers, are very difficult for most of the students to find using inspection, because their knowledge of C is weak. This exercise drives home the point that inspection is most effective when done by experienced professionals.

The assignments could be extended to show how reliability could be estimated, and to apply the ideas of random testing based on operational profiles that represent the actual statistical frequencies of use of the access programs of the module. A presentation of these ideas is part of the graduate course, but is not covered by an assignment.

5 Suggestions For Improvements

Current industrial testing practice includes a variety of other topics in addition to those mentioned in this paper so far. Among them are other types of testing, such as incremental testing, function testing, performance and stress testing, usability testing, installation testing, regression testing, and mutation testing. It would be interesting to include a discussion of alternative models of testing used in industry, such as the practice of teaming a tester with each development team leader. Current practice with regard to all types of testing is improving rapidly. If a wider range of types of testing is to be introduced into undergraduate or graduate curriculum, it should be on the basis of their fundamental underlying principles. These fundamental principals have not been well delineated; there are very few textbooks that can even be considered in this area.

There are many other topics that should be covered, or covered more thoroughly. One is how to determine operational profiles, and how accurate operational profiles are likely to be. This is a controversial issue. Another is management of testing resources. For example, different versions of the same software may need to be tested, which means that version control is needed for test suites. There are also many research papers on testing in the academic literature, as well as many industrial experience reports. Some of these could be included. As well, a basis for evaluation and comparison of commercial testing tools could be taught. These are just examples of topics that should be considered in any serious initiative to develop improved curricula on the testing of software.

There are clear advantages to a separate course that teaches V&V techniques. It provides a context for focusing on the comparison of the different techniques available. As well, it allows a larger range of techniques to be presented, and gives a clearer indication of the importance of V&V in software development. It is also essential to include V&V material in other courses on specification, design, and implementation, since V&V is part of any software process. The question of what to include in other courses and how much to split off into its own course is a difficult question in curriculum design.

6 Conclusion

This paper has presented two examples, one at the undergraduate level and the other at the graduate level, to illustrate two approaches to providing an adequate engineering education in the area of testing of software. There is no established curriculum in this area, and there clearly is a need. The material has legitimacy as an undergraduate subject, but there is little space in existing curricula, even in emerging undergraduate degree programs in software engineering. Testing is only one of many V&V topics that get short shrift in current curricula. Many of these topics, including testing, are reaching a level of maturity that allows them to be defined well enough to be taught to undergraduates. There will need to be more dialogue between academia and software practitioners to ensure that practical material with a sound engineering foundation is easily available in a form that is suitable for inclusion in undergraduate curricula.

The answer to the question of how much testing should be taught is determined by many factors. These include what degree is being offered, what the students expect to be doing on graduation, whether there is some minimum level of knowledge and understanding that should be acquired prior to graduation, and if so, what that level should be. We believe that there is a minimum level, and that in most degree programs intended to prepare students to do software development, it is not reached.

This paper has presented a pragmatic view of how testing has been taught over the last few years in one particular setting. It includes a collection of essential ideas. These set a context for the teaching of testing, and roughly define a minimum acceptable level that every undergraduate who is going to do software development on graduation should understand. More work is needed to make this definition precise, and to adapt it so that it can be included in standard curricula. The main purpose of this paper is to encourage the testing community to undertake this work.

References

[1] Boris Beizer, Software Testing Techniques, van Nostrand Reinhold, 1983, 2nd Edition, 1990

[2] M.E. Fagan "Design and Code Inspections to Reduce Errors in Program Development", IBM Systems Journal, March 1976, vol.15, no.3, pp.105-211

[3] Glen W. Russell, "Experience with Inspection in Ultralarge-Scale Developments", IEEE Software, January 1991, pp. 25-31

[4] Peter Naur and Brian Randell, Software Engineering, Report on a Conference sponsored by the NATO Science Committee, Garmisch, Germany, October 1968

[5] Computing Curricula 1991, Report of the ACM/IEEE-CS Joint Curriculum Task Force, IEEE Computer Society Press, 1991

[6] P. Allen Currit, Michael Dyer, Harlan Mills, "Certifiying the Reliablility of Software", IEEE TSE Jan 1986, pp. 3-11; Correction TSE Mar 89 p362

[7] R.W. Butler and G.B. Finelli, "The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software", IEEE TSE 19(1) pp. 3-12 (Jan 93)

[8] Terry Shepard, "Software Engineering in an Undergraduate Computer Engineering Program", Proceedings of the 7th SEI Conference on Software Engineering Education, San Antonio, TX, 5-7 January 1994, pp. 23-34

[9] Dan Hoffman and Paul Strooper, Software Design, Automated Testing, and Maintenance: A Practical Approach, International Thomson Computer Press , 1995

[10] Terry Shepard, "On Teaching Software Verification and Validation", Proceedings of the 8th SEI Conference on Software Engineering Education, New Orleans, LA, 29 March - 1 April, 1995, pp. 375-386