Professor of Biochemistry and Molecular & Cellular Biology
Director of Protein Information Resource
Georgetown University Medical Center
3300 Whitehaven Street NW, Suite 1200, Washington, DC 20007
Linking Text Mining, Ontology and Systems Biology
The ever-increasing volume of scientific literature now available electronically and the exponential growth of large-scale molecular sequence data have prompted active research in biological text mining and information extraction to facilitate literature-based manual curation of molecular databases. Several text mining resources have been developed, such as iProLINK (http://pir.georgetown.edu/iprolink/), with annotated literature data and text mining tools to serve as a knowledge link bridging UniProt and PubMed. The tools include BioThesaurus for identification of synonymous and ambiguous gene/protein names to support named entity recognition, and RLIMS-P for mining protein phosphorylation objects from Medline abstracts.
With systems integration becoming the driving force for the 21st century biology, researchers are systematically tackling gene functions and complex regulatory processes by studying organisms at different levels of organization. Meanwhile, bio-ontologies are emerging as critical tools in biological research where complex data in disparate resources need to be integrated. In particular, the Open Biomedical Ontologies (OBO) Foundry has emerged as the framework for community ontology development, which aims to create a suite of orthogonal interoperable reference ontologies organized along two dimensions: granularity (from molecule to population) and relation to time (objects, qualities, processes). In light of the deluge of systems biology data, evidence attribution of experimentally validated information extracted from the scientific literature will become increasingly important to ensure the annotation quality of databases and bio-ontologies.
There are community-wide efforts such as BioCreAtIvE challenge evaluation, for evaluating text mining and information extraction systems applied to the biological domain. Our challenge is to develop text mining tools and systems that will be broadly utilized by biologists by bringing together text mining teams and database/ontology curators and biologists for system development and evaluation. This will allow the text mining community to provide the link from literature to knowledge (as encapsulated by the name of this BioLINK SIG—linking of Literature, INformation and Knowledge for Biology), thereby, facilitating data integration, analysis, and knowledge discovery in the systems biology context.
Biography: Dr. Wu is Professor of Biochemistry and Molecular Biology, Professor of Oncology, Director of Bioinformatics Track, and Director of the Protein Information Resource (PIR). With background and experience in both biology and computer science, she has conducted bioinformatics and computational biology research for 20 years. Since 1999 she has led the development of PIR as a major public bioinformatics resource that supports genomic, proteomic and systems biology research. Dr. Wu has
served on several advisory boards, including the HUPO (Human Proteome Organization) Council, the US HUPO Board of Directors, the Protein Data Bank (PDB) Scientific Advisory Board, the NIGMS Protein Structure Initiative Advisory Committee at NIH, the TeraGrid User Advisory Committee at NSF, and the ISCB (International Society for Computational Biology) Board of Directors. She has also served on numerous program committees for international bioinformatics and proteomics conferences and workshops. She has published about 130 peer-reviewed papers and three books, and given more than 100 invited lectures. Her research interests include protein evolution-structure-function relationships, proteomics informatics and computational systems biology, biomedical text mining and ontology, and bioinformatics cyberinfrastructure.
Page last updated by