Research Interests
- Computational Biology, Biomedical Informatics: Data Mining and
Information Retrieval
- Machine Learning
- Databases -- Similarity Queries
Computational Biology, Biomedical Informatics: Data Mining and Information Retrieval
My work concentrates on applying machine learning techniques to biomedical
data. Specific current projects include informative/functional SNP selection for disease-gene association, studying the combination of environmental and genetic factors involved in Autism (these belong
in what is currently referred to as Translational Biomedicine),
protein sub-cellular
localization (sorting), and biomedical literature mining
- finding documents and facts of interest in the PubMed database
(and/or other text sources) using a wide variety of tools and approaches.
A lot of the work is done within a probabilistic Bayesian
framework, where we formulate the problems of finding relevant and significant
information as a search for parameters of a specifically-defined
probabilistic model.
I also work on other interesting problems in genomic sequence analysis.
A lot of the work is done collaboratively, with researchers in Canada, North America, and Europe.
:
Some related publications:
(See the publications page for a complete list of relevant publications)
-
A. Höglund, T. Blum, S. Brady, P. Dönnes, J. San Miguel, M.
Rocherford, O. Kohlbacher and H. Shatkay.
Significantly Improved Prediction of Subcellular Localization by Integrating
Text and Protein Sequence Data. Proc. of the Pacific Symposium on Biocomputing (PSB), 2006. (pp. 16-27) .
(Also selected for oral presentation during PSB.).
PDF
- M. Chagoyen, P. Carmona-Saez, H. Shatkay , J.M. Carazo,
A. Pascual-Montano.
Discovering Semantic Features in the Literature: A Foundation for Building Functional Associations.
BMC Bioinformatics. (To appear).
- H. Shatkay.
Hairpins in Bookstacks: Information Retrieval from Biomedical Text.
Briefings in Bioinformatics, V.6, #3,
(pp. 222-238), September 2005.
PDF.
- Z. Zheng, S. Brady, A. Garg and H. Shatkay.
Applying Probabilistic Thematic Clustering for Classification
in the TREC 2005 Genomics Track
TREC2005. (To appear).
- H. Shatkay, J. Miller, C. Mobarry, M. Flanigan, S. Yooseph and G. Sutton.
ThurGood: Evaluating Assembly-to-Assembly Mapping.
Journal of Computational Biology (JCB), V.11, #5, (pp. 800-811),
October, 2004.
PDF
-
H. Shatkay and R. Feldman.
Mining the Biomedical Literature in the Genomic Era: An Overview.
Journal of Computational Biology (JCB), V.10, #6, (pp. 821-856),
December, 2003.
PDF
-
H. Shatkay, S. Edwards, J.W. Wilbur and M. Boguski.
Genes, Themes and Microarrays: Using Information Retrieval for Large
Scale Gene Analysis. ISMB2000.
PDF.
-
H. Shatkay, S. Edwards and M. Boguski.
Information Retrieval Meets Gene Analysis.
IEEE Intelligent Systems, Special Issue on Intelligent Systems in Biology,
V.17, #2, March/April 2002.
Abstract
A modified PDF version.
-
H. Shatkay and W.J. Wilbur.
Finding Themes in Medline: Statistical Similarity Search.
IEEE, Advances in Digital Libraries (ADL), 2000.
PDF.
Machine Learning (Earlier work)
Machine Learning for robotics and for other dynamical systems.
My PhD work was about learning maps, represented as collections of
hidden Markov models, (such collections are aka POMDP models),
for robot navigation purposes. Like in other forms of learning
models for dynamical systems, the input is a sequence of data, and the
learning task is to fit a model to the data.
In the robot navigation case, the input is a sequence of observations
gathered by the robot, and the output is a POMDP representing the environment
traversed by the robot. This work was done with my PhD advisor,
Prof. Leslie Kaelbling.
Related publications:
- H. Shatkay and L.P. Kaelbling.
Learning Geometrically-Constrained Hidden Markov Models for
Robot Navigation: Bridging the Geometrical-Topological Gap.
Journal of AI Research (JAIR), (pp. 167-207), April, 2002.
PDF.
-
H. Shatkay. Learning Hidden Markov Models with Geometrical Constraints.
UAI99. PDF.
-
H. Shatkay and L.P. Kaelbling. Heading in the Right Direction.
ICML98. PDF.
-
H. Shatkay and L.P. Kaelbling. Learning Topological Maps with Weak Local Odometric Information. IJCAI97 .
PDF.
-
The complete report is available in my PhD dissertation,
"Learning Models for Robot Navigation",
Technical Report cs-98-11, Brown University, Department of Computer
Science, Providence, RI, , December, 1998.
Abstract,
Complete report in compressed postscript.
dynamics research group
-
A tutorial that I participated in creating (with Tom Dean and Sonia Leach) on
learning dynamical systems.
Tutorial .
Similarity Queries (Even earlier...)
Looking in a dataset for patterns that "sort-of-look-like-this"...
I still find this problem interesting and relevant.
Related publications:
Last modified: July 4, 2008