Stephen W. Thomas | Publications

This page lists my conference and journal papers, technical reports, and theses.
(See me on DBLP or Google Scholar.)
(Also see the full list of SAIL, TAU, or TimeCenter publications.)

Using Information Retrieval Models to Mine Software Repositories

The Impact of Classifier Configuration and Classifier Combination on Bug Localization
Stephen W. Thomas, Meiyappan Nagappan, Dorothea Blostein, Ahmed E. Hassan
IEEE Transactions on Software Engineering, Accepted May 2013.
(abstract) (preprint) (data)

Bug localization is the task of determining which source code entities are relevant to a bug report. Manual bug localization is labor intensive, since developers must consider thousands of source code entities. Current research builds bug localization classifiers, based on information retrieval models, to locate entities that are textually similar to the bug report. Current research, however, does not consider the effect of classifier configuration, i.e., all the parameter values that specify the behavior of a classifier. As such, it is unknown the effect of each parameter or which parameter values lead to the best performance. In this paper, we empirically investigate the effectiveness of a large space of classifier configurations, 3,172 in total. Further, we introduce a framework for combining the results of multiple classifier configurations, since classifier combination has shown promise in other domains. Through a detailed case study on over 8,000 bug reports from three large-scale projects, we make two main contributions. First, we show that the parameters of a classifier have a significant impact on its performance. Second, we show that combining multiple classifiers-whether those classifiers are hand-picked or randomly chosen relative to intelligently-defined subspaces of classifiers-improves the performance of even the best individual classifiers.
Mining Unstructured Software Repositories Using IR Models
Stephen W. Thomas
PhD Thesis, Queen's University, December 2012.
(abstract) (bib) (pdf) (university page) (data)

Mining Software Repositories, which is the process of analyzing the data related to software development practices, is an emerging field which aims to aid development teams in their day to day tasks. However, data in many software repositories is currently unused because the data is unstructured, and therefore difficult to mine and analyze. Information Retrieval (IR) techniques, which were developed specifically to handle unstructured data, have recently been used by researchers to mine and analyze the unstructured data in software repositories, with some success.

The main contribution of this thesis is the idea that the research and practice of using IR models to mine unstructured software repositories can be improved by going beyond the current state of affairs. First, we propose new applications of IR models to existing software engineering tasks. Specifically, we present a technique to prioritize test cases based on their IR similarity, giving highest priority to those test cases that are most dissimilar. In another new application of IR models, we empirically recover how developers use their mailing list while developing software.

Next, we show how the use of advanced IR techniques can improve results. Using a framework for combining disparate IR models, we find that bug localization performance can be improved by 14-56% on average, compared to the best individual IR model. In addition, by using topic evolution models on the history of source code, we can uncover the evolution of source code concepts with an accuracy of 87-89%.

Finally, we show the risks of current research, which uses IR models as black boxes without fully understanding their assumptions and parameters. We show that data duplication in source code has undesirable effects for IR models, and that by eliminating the duplication, the accuracy of IR models improves. Additionally, we find that in the bug localization task, an unwise choice of parameter values results in an accuracy of only 1%, where optimal parameters can achieve an accuracy of 55%.

Through empirical case studies on real-world systems, we show that all of our proposed techniques and methodologies significantly improve the state-of-the-art.

@phdthesis{thomas_mining_2012, author={Stephen W. Thomas}, title={Mining Unstructured Software Repositories Using IR Models}, school={Queen's University}, year={2012}, }
What are developers talking about? An analysis of topics and trends in Stack Overflow
Anton Barua, Stephen W. Thomas, and Ahmed E. Hassan
Empirical Software Engineering, Accepted September, 2012.
(abstract) (bib) (preprint) (data)

Programming question and answer (Q and A) websites, such as Stack Overflow, leverage the knowledge and expertise of users to provide answers to technical questions. Over time, these websites turn into repositories of software engineering knowledge. Such knowledge repositories can be invaluable for gaining insight into the use of specific technologies and the trends of developer discussions. Previous work has focused on analyzing the user activities or the social interactions in Q and A websites. However, analyzing the actual textual content of these websites can help the software engineering community to better understand the thoughts and needs of developers. In the article, we present a methodology to analyze the textual content of Stack Overflow discussions. We use latent Dirichlet allocation (LDA), a statistical topic modeling technique, to automatically discover the main topics present in developer discussions. We analyze these discovered topics, as well as their relationships and trends over time, to gain insights into the development community. Our analysis allows us to make a number of interesting observations, including: the topics of interest to developers range widely from jobs to version control systems to C# syntax; questions in some topics lead to discussions in other topics; and the topics gaining the most popularity over time are web development (especially jQuery), mobile applications (especially Android), Git, and MySQL.

@article{barua_developers_2013, author={Anton Barua and Stephen W. Thomas and Ahmed E. Hassan}, journal={Empirical Software Engineering}, title={What are developers talking about? An analysis of topics and trends in Stack Overflow}, volume={}, number={}, pages={1-31}, year={2012}, }
Studying Software Evolution Using Topic Models
Stephen W. Thomas, Bram Adams, Dorothea Blostein, and Ahmed E. Hassan
Science of Computer Programming, Accepted August 2012.
(abstract) (bib) (preprint)

Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be ective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task. In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87-89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system.

@article{thomas_studying_2013, title = {Studying Software Evolution Using Topic Models}, journal = {Science of Computer Programming}, author = {Thomas, S. W. and Adams, B. and Blostein, D. and Hassan, A. E.}, pages = {1--23}, year={2013}, }
Static Test Case Prioritization Using Topic Models
Stephen W. Thomas, Hadi Hemmati, Ahmed E. Hassan, and Dorothea Blostein
Empirical Software Engineering, Accepted June 2012.
(abstract) (bib) (preprint) (publisher) (data)

Software development teams use test suites to test changes to their source code. In many situations, the test suites are so large that executing every test for every source code change is infeasible, due to time and resource constraints. Development teams need to prioritize their test suite so that as many distinct faults as possible are detected early in the execution of the test suite. We consider the problem of static black-box test case prioritization (TCP), where test suites are prioritized without the availability of the source code of the system under test (SUT). We propose a new static black-box TCP technique that represents test cases using a previously unused data source in the test suite: the linguistic data of the test cases, i.e., their identifier names, comments, and string literals. Our technique applies a text analysis algorithm called topic modeling to the linguistic data to approximate the functionality of each test case, allowing our technique to give high priority to test cases that test different functionalities of the SUT. We compare our proposed technique with existing static black-box TCP techniques in a case study of multiple real-world open source systems: several versions of Apache Ant and Apache Derby. We find that our static black-box TCP technique outperforms existing static black-box TCP techniques, and has comparable or better performance than two existing execution-based TCP techniques. Static black-box TCP methods are widely applicable because the only input they require is the source code of the test cases themselves. This contrasts with other TCP techniques which require access to the SUT runtime behavior, to the SUT specification models, or to the SUT source code.

@article{thomas_static_2013, title = {Static test case prioritization using topic models}, author = {Thomas, S. W. and Hemmati, H. and Hassan, A. E. and Blostein, D.}, issn={1382-3256}, journal={Empirical Software Engineering}, doi={10.1007/s10664-012-9219-7}, url={http://dx.doi.org/10.1007/s10664-012-9219-7}, publisher={Springer US}, keywords={Testing and debugging; Test case prioritization; Topic models}, pages={1-31}, year={2012}, }
Explaining Software Defects Using Topic Models
Tse-Hsun Chen, Stephen W. Thomas, Meiyappan Nagappan, and Ahmed E. Hassan
Proceedings of the 9th Working Conference on Mining Software Repositories, pages 189-198. Zurich, Switzerland. June 2-3, 2012.
(abstract) (bib) (preprint) (publisher)

Researchers have proposed various metrics based on measurable aspects of the source code entities (e.g., methods, classes, files, or modules) and the social structure of a software project in an effort to explain the relationships between software development and software defects. However, these metrics largely ignore the actual functionality, i.e., the conceptual concerns, of a software system, which are the main technical concepts that reflect the business logic or domain of the system. For instance, while lines of code may be a good general measure for defects, a large entity responsible for simple I/O tasks is likely to have fewer defects than a small entity responsible for complicated compiler implementation details. In this paper, we study the effect of conceptual concerns on code quality. We use a statistical topic modeling technique to approximate software concerns as topics; we then propose various metrics on these topics to help explain the defect-proneness (i.e., quality) of the entities. Paramount to our proposed metrics is that they take into account the defect history of each topic. Case studies on multiple versions of Mozilla Firefox, Eclipse, and Mylyn show that (i) some topics are much more defect-prone than others, (ii) defect-prone topics tend to remain so over time, and (iii) defect-prone topics provide additional explanatory power for code quality over existing structural and historical metrics.

@inproceedings{chen_explaining_2012, title = {Explaining software defects using topic models}, booktitle = {Proceedings of the 9th Working Conference on Mining Software Repositories}, author = {T. Chen and S. W. Thomas and M. Nagappan and A. E. Hassan}, pages = {189--198}, year = {2012}, }
Mining Software Repositories with Topic Models
Stephen W. Thomas
Technical Report No. 2012-586, School of Computing, Queen's University, February 2012, 39+iv pages.
(bib) (pdf)

@techreport{thomas_mining_2012, title = {Mining software repositories with topic models}, number = {2012-586}, institution = {School of Computing, Queen's University}, author = {S. W. Thomas}, year = {2012}, }
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Nicolas Bettenburg, Stephen W. Thomas, and Ahmed E. Hassan
Proceedings of the 16th European Conference on Software Maintenance and Reengineering, pages 319-328. Szeged, Hungary. March 27-30, 2012.
(abstract) (bib) (preprint)

When discussing software, practitioners often reference parts of the project's source code. Such references have different motivations, such as mentoring and guiding less experienced developers, pointing out code that needs changes, or proposing possible strategies for the implementation of future changes. The fact that particular parts of a source code are being discussed makes these parts of the software special. Knowing which code is being talked about the most can not only help practitioners to guide important software engineering and maintenance activities, but also act as a high-level documentation of development activities for managers. In this paper we use clone-detection as specific instance of a code search based approach for establishing links between code fragments that are discussed by developers and the actual source code of a project. Through a case study on the Eclipse project we explore the traceability links established through this approach, both quantitatively and qualitatively, and compare code search based traceability linking to classical approaches, in particular change log analysis and information retrieval. We demonstrate a sample application of code search based traceability links by visualizing those parts of the project that are most discussed in issue reports with a Treemap visualization. The results of our case study show that the traceability links established through code search based traceability linking are conceptually different than classical approaches based on change log analysis or information retrieval.

@inproceedings{bettenburg_modeling_2011, title = {Using fuzzy code search to link code fragments in discussions to source code}, booktitle = {Proceedings of the 16th European Conference on Software Maintenance and Reengineering}, author = {N. Bettenburg and S. W. Thomas and A. E. Hassan}, pages = {319--328}, year = {2012}, }
Modeling the Evolution of Topics in Source Code Histories
Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein
Proceedings of the 8th Working Conference on Mining Software Repositories, pages 173-182. Honolulu, HI, USA. May 21-22, 2011.
(abstract) (bib) (pdf) (errata) (more info)

Studying the evolution of topics (collections of co-occurring words) in a software project is an emerging technique to automatically shed light on how the project is changing over time: which topics are becoming more actively developed, which ones are dying down, or which topics are lately more error-prone and hence require more testing. Existing techniques for modeling the evolution of topics in software projects suffer from issues of data duplication, i.e., when the repository contains multiple copies of the same document, as is the case in source code histories. To address this issue, we propose the Diff model, which applies a topic model only to the changes of the documents in each version instead of to the whole document at each version. A comparative study with a state-of-the-art topic evolution model shows that the Diff model can detect more distinct topics as well as more sensitive and accurate topic evolutions, which are both useful for analyzing source code histories.

@inproceedings{thomas_modeling_2011, title = {Modeling the evolution of topics in source code histories}, booktitle = {Proceedings of the 8th Working Conference on Mining Software Repositories}, author = {S. W. Thomas and B. Adams and A. E. Hassan and D. Blostein}, pages = {173--182}, year = {2011}, }
Mining Software Repositories Using Topic Models
Stephen W. Thomas
Proceedings of the 33rd International Conference on Software Engineering (Doctoral Symposium), pages 1138-1139. Honolulu, HI, USA. May 23, 2011.
(abstract) (bib) (pdf) (poster)

Software repositories, such as source code, email archives, and bug databases, contain unstructured and unlabeled text that is divcult to analyze with traditional techniques. We propose the use of statistical topic models to automatically discover structure in these textual repositories. This discovered structure has the potential to be used in software engineering tasks, such as bug prediction and traceability link recovery. Our research goal is to address the challenges of applying topic models to software repositories.

@inproceedings{thomas_mining_2011, title = {Mining software repositories using topic models}, booktitle = {Proceedings of the 33rd International Conference on Software Engineering}, author = {Stephen W. Thomas}, pages = {1138--1139}, year = "2011" }
Validating the Use of Topic Models for Software Evolution
Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein
Proceedings of the 10th International Working Conference on Source Code Analysis and Manipulation, pages 55-64. Timisoara, Romania. September 12-13, 2010.
(abstract) (bib) (pdf) (errata) (slides)

Topics are collections of words that co-occur frequently in a text corpus. Topics have been found to be effective tools for describing the major themes spanning a corpus. Using such topics to describe the evolution of a software system's source code promises to be extremely useful for development tasks such as maintenance and re-engineering. However, no one has yet examined whether these automatically discovered topics accurately describe the evolution of source code, and thus it is not clear whether topic models are a suitable tool for this task.

In this paper, we take a first step towards determining the suitability of topic models in the analysis of software evolution by performing a qualitative case study on 12 releases of JHotDraw, a well studied and documented system. We define and compute various metrics on the identifed topics and manually investigate how the metrics evolve over time. We find that topic evolutions are characterizable through spikes and drops in their metric values, and that the large majority of these spikes and drops are indeed caused by actual change activity in the source code. We are thus encouraged by the use of topic models as a tool for analyzing the evolution of software.

@inproceedings{thomas_validating_2010, title = {Validating the use of topic models for software evolution}, booktitle = {Proceedings of the 10th International Working Conference on Source Code Analysis and Manipulation}, author = {S. W. Thomas and B. Adams and A. E. Hassan and D. Blostein}, year = {2010}, pages = {55--64} }
DiffLDA: Topic Modeling in Software Projects
Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein
Technical Report No. 2010-574, School of Computing, Queen's University, July 2010, 24 pages.
(abstract) (bib) (pdf) (more info)

Previous research has shown that topics can be automatically discovered in a software project's source code. Topics are collections of words that co-occur frequently in a text collection and are discovered using topic models such as latent Dirichlet allocation (LDA). Tracking how topics evolve, i.e., grow and spread, over time is useful for supporting software maintenance, comprehension, and re-engineering activities.

The evolution of topics is typically recovered by applying LDA to all versions of a project's source code at once, followed by post processing to map topics across versions. Although this technique works well in applications where each version of the data is completely dirent, for example in the analysis of conference proceedings, the technique does not work well with source code, which typically changes only incrementally and contains significant duplication across versions. In this paper, we present a new approach, called DiffLDA, for automatically mining topic evolution in source code. The approach addresses LDA's sensitivity to document duplication by operating on the differences between versions of a source code document, resulting in a more accurate, finer-grained representation of topic evolution. We validate our approach through case studies on simulated data and two open source projects.

@techreport{thomas_difflda:_2010, title = {{DiffLDA:} {T}opic evolution in software projects}, number = {2010-574}, institution = {School of Computing, Queen's University}, author = {S. W. Thomas and B. Adams and A. E Hassan and D. Blostein}, year = {2010}, }

Temporal Databases and Benchmarks

Benchmark Frameworks and τBench
Stephen W. Thomas, Richard T. Snodgrass, and Rui Zhang
Software: Practice and Experience, pages 1-29. 2013.
(abstract) (publisher) (preprint)

Software engineering frameworks tame the complexity of large collections of classes by identifying structural invariants, regularizing interfaces, and increasing sharing across the collection. We wish to appropriate these benefits for families of closely related benchmarks, say for evaluating query engine implementation strategies. We introduce the notion of a benchmark framework, an ecosystem of benchmarks that are related in semantically rich ways and enabled by organizing principles. A benchmark framework is realized by iteratively changing one individual benchmark into another, say by modifying the data format, adding schema constraints, or instantiating a different workload. Paramount to our notion of benchmark frameworks are the ease of describing the differences between individual benchmarks and the utility of methods to validate the correctness of each benchmark component by exploiting the overarching ecosystem. As a detailed case study, we introduce Bench, a benchmark framework consisting of ten individual benchmarks, spanning XML, XQuery, XML Schema, and PSM, along with temporal extensions to each. The second case study examines the Mining Unstructured Data benchmark framework, and the third examines the potential benefits of rendering the TPC family as a benchmark framework.
Temporal Support for Persistent Stored Modules
Richard T. Snodgrass, Dengfeng Gao, Rui Zhang, and Stephen W. Thomas
Proceedings of the 28th International Conference on Data Engineering, pages 114-125. Washington, DC, USA. April 1-5, 2012.
(abstract) (bib) (preprint) (webpage)

We show how to extend temporal support of SQL to the Turing-complete portion of SQL, that of persistent stored modules (PSM). Our approach requires minor new syntax beyond that already in SQL/Temporal to define and to invoke PSM routines, thereby extending the current, sequenced, and nonsequenced semantics of queries to PSM routines. Temporal upward compatibility (existing applications work as before when one or more tables are rendered temporal) is ensured. We provide a transformation that converts Temporal SQL/PSM to conventional SQL/PSM. To support sequenced evaluation of PSM routines, we define two different slicing approaches, maximal slicing and per-statement slicing. We compare these approaches empirically using a comprehensive benchmark and provide a heuristic for choosing between them.

@inproceedings{snodgrass_temporal_2012, title = {Temporal support for {P}ersistent {S}tored {M}odules}, booktitle = {Proceedings of the 28th International Conference on Data Engineering}, author = {Richard T. Snodgrass and Dengfeng Gao and Rui Zhang and Stephen W. Thomas}, pages = {114--125}, year = {2012} }
Adding Temporal Constraints to XML Schema
Faiz Currim, Sabah Currim, Curtis Dyreson, Richard T. Snodgrass, Stephen W. Thomas, and Rui Zhang
IEEE Transactions on Knowledge and Data Engineering, 24(8), pages 1361-1377, 2012.
(abstract) (bib) (preprint) (wiki)

If past versions of XML documents are retained, what of the various integrity constraints defined in XML Schema on those documents? This paper describes how to interpret such constraints as sequenced constraints, applicable at each point in time. We also consider how to add new variants that apply across time, so-called non-sequenced constraints. Our approach supports temporal documents that vary over both valid and transaction time, whose schema can vary over transaction time. We do this by replacing the schema with a (possibly time-varying) temporal schema and replacing the document with a temporal document, both of which are upward compatible with conventional XML and with conventional tools like XMLLINT, which we have extended to support the temporal constraints introduced here.

@article{currim_adding_2011, title = {Adding temporal constraints to {XML} {S}chema}, journal = {{IEEE} Transactions on Knowledge and Data Engineering}, author = {Faiz Currim and Sabah Currim and Curtis Dyreson and Richard T. Snodgrass and Stephen W. Thomas and Rui Zhang}, volume = {24}, number = {8}, pages = {1361--1377}, year = {2012} }
τBench: Extending XBench with Time
Stephen W. Thomas, Richard T. Snodgrass, and Rui Zhang
TimeCenter TR-92, December 2010, 63+vi pages.
(bib) (pdf) (more info)

@techreport{thomas_tbench:_2010, author = {Stephen W. Thomas and Richard T. Snodgrass and Rui Zhang}, title = {{{$\tau$Bench}: {E}xtending {XB}ench with time}}, institution = {TimeCenter}, number = {TR-93}, month= {December}, year = {2010} }
τXSchema- Support for Data- and Schema-Versioned XML Documents
Faiz Currim, Sabah Currim, Curtis Dyreson, Shailesh Joshi, Richard T. Snodgrass, Stephen W. Thomas, and Eric Roeder
TimeCenter TR-91, September 2009, 264+xiv pages.
(abstract) (bib) (pdf) (wiki)

The W3C XML Schema recommendation defines the structure and data types for XML documents. XML Schema lacks explicit support for time-varying XML documents or for time-varying schemas. An XML document evolves as it is updated over time or as it accumulates from a streaming data source. A temporal document records the entire history of a document rather than just its current state or snapshot. Capturing a document's evolution is vital to providing the ability to recover past versions, track changes over time, and evaluate temporal queries. Capturing the evolution of a document's schema is similarly important. To date, users have to resort to ad hoc, non-standard mechanisms to create schemas for time-varying XML documents and to deal with evolving schemas.

This report presents a data model and architecture, called τXSchema, for constructing and validating temporal XML documents through the use of a temporal schema. A temporal schema guides the construction of a temporal document and is essential to managing, querying, and validating temporal documents. The temporal schema consists of a non-temporal (conventional) schema, logical annotation(s), and physical annotation(s). The annotations specify which portion(s) of an XML document can vary over time, how the document can change, and where timestamps should be placed. These components can themselves individually evolve over time. The advantage of using annotations to denote the time-varying aspects is that logical and physical data independence for temporal schemas can be achieved while remaining fully compatible with both existing XML Schema documents and the XML Schema recommendation. This report also describes how to construct a temporal document by "gluing" individual snapshots into an integrated history.

This technical report is divided into three parts: concerning instance versioning, extending to schema versioning, and reviewing the entire τXSchema language. The first two parts have a parallel structure. Each begins by discussing relevant related work before providing a motivating example that illustrates the challenges of instance and schema versioning, respectively, then lists design decisions made in τXSchema concerning that challenge. Theoretical considerations (separately for instance and schema versioning), architectural considerations, and implementation details are discussed in that order in each of the two parts. Each part ends with full example schema and instance documents. The third part completes the picture with a discussion of related work and research topics to be considered in the future.

@article{currim_txschema:_2009, author = {Faiz Currim and Sabah Currim and Curtis Dyreson and Shailesh Joshi and Richard T. Snodgrass and Stephen W. Thomas and Eric Roeder}, year = {2009}, title = {{{$\tau$}XSchema}: {S}upport for data- and schema-versioned {XML} documents}, journal = {TimeCenter}, note = {TR-91} }
The Implementation and Evaluation of Temporal Representations in XML
Stephen W. Thomas
Master's Thesis, Computer Science Department, University of Arizona, March 2009.
(abstract) (bib) (pdf) (wiki)

The design space for representing temporal data in XML has thus far received limited empirical attention. As a result, designers do not fully understand the design space and trade-offs between various representational approaches. This thesis presents an initial characterization of that design space and provides a qualitative and quantitative analysis of the extreme ends of the spectrum, as well as an expressive language for describing temporal data. We extend an existing language, τXSchema, to implement three classes of representations and show that the edit-based scheme provides the best performance in terms of creation time and representation size, although the item-based and sliced-based schemes can be validated more quickly. We also provide an analysis of where temporal constraint functionality should be implemented and show that, with a few exceptions, temporal constraint functionality must be implemented within the tools and cannot be implemented in the representational schema. These results provide insight into the overall design space for temporal representations that will be useful to researchers, tool implementers, and users of τXSchema.

@mastersthesis{thomas_implementation_2009, author = {Thomas, Stephen W.}, title = {The implementation and evaluation of temporal representations in {XML}}, school = {Computer Science Department, University of Arizona}, year = {2009}, month = {March} }

Stephen W. Thomas | Publications

Main | Bio | Activities | Data | Blog | Publications

Using Information Retrieval Models to Mine Software Repositories

Temporal Databases and Benchmarks