Algorithms and Machine Learning Methods in Computational Biology, CISC490
Where Artificial Intelligence Meets Real Life...
Office: Goodwin 756;
Office hours: Wednesday, 1:00-3:00pm
Note Change in Time on Wednesdays!
- Tuesday 6:30-8:00 PM, Goodwin 521
- Wednesday 6:00-7:30 PM, Goodwin 521
TA: Phil Hyoun Lee.
TA hours: Thursday, 5:00-7:00pm, Goodwin 235.
What does dynamic programming have to do with biology?
Why is Bayes rule important for large-scale gene analysis? (and what does it mean to be Bayesian???)
What are Markov models and why is Markov "hidden"? What do these models have to
do with genes and proteins?
Why should biologists care about minimal spanning trees or shortest paths in a graph?
What is Machine Learning? How is it used in text mining,
in gene expression analysis, in protein sub-cellular localization, in...?
These are some examples of what CISC-490 is about.
Throughout the course we shall examine topics in probability, algorithms
and learn how they combine to form current methods in computational biology.
Who should take this course?
The course is intended primarily for 3rd and 4th year biomedical-computing students.
HOWEVER, it is open to 4th year students, from all programs in the School
of Computing and in the ECE dept, who are interested in
biomedical applications and/or machine learning.
The course is quite challenging, but apparently -
to quote students who took it last year - "worth the effort".
If you are indeed interested
in how real algorithms, real models and real
are used to solve real current problems in computational biology,
this course is for you.
If you are unsure whether CISC-490 is right for you -
please contact the instructor.
* Understanding of basic concepts in probability and statistics is assumed
* CISC-365 highly recommended (can be taken in conjunction with CISC490).
* CISC-352 a plus (can be taken in conjunction with CISC490).
* Background in Molecular Biology is an advantage - but not required.
As there is no single textbook that covers all the topics,
the major source of information will be the material covered in class.
Copies of my slides, (which contain high-level points and illustrations)
will be handed out at the beginning of each class.
The detailed explanation will be given throughout the class.
No textbook can substitute for regular class attendance and good class notes.
As for the textbook, the primary book is:
Biological Sequence Analysis.
R. Durbin, S. Eddy, A. Krogh, G. Mitchison. 1998.
A number of copies
are reserved in the Engineering and Science Library.
However, it will be used only through part of the course.
Note that we shall neither cover all of the material in the book,
nor use it exclusively.
Additional material will be distributed and used
throughout the course. Some will be given as handouts in class.
The rest will be available online (See list below).
Note: Some links may only be accessed from departmental machines due to copyright issues.
Brudno M, Malde S, Poliakov A, Do C, Couronne O, Dubchak I, Batzoglou S.
Glocal Alignment: Finding Rearrangements During Alignment.
Bioinfomratics 19: 54-62, 2003.
- Shatkay H, Miller J, Mobarry C, Flanigan M, Yooseph S, Sutton G.
ThurGood: Evaluating Assembly-to-Assembly Mapping,
Journal of Computational Biology (JCB), V.11, #5, 800-811, 2004.
- Rabiner L.R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.
Proc. of the IEEE, 77(2), 257-285, 1989.
- Burge C, Karlin S. Prediction of Complete Gene Structures in Human Genomic DNA.
J. Mol. Biol. 268, 78-94, 1997.
- Pop M, Salzberg SL, Shumway M.
Sequence Assembly: Algorithms and Issues.
IEEE Computer 35(7), pp. 47-54. 2002.
- Shamir R, Sharan R.
Algorithmic Approaches to Clustering Gene Expression Data.
Current Topics in Computational Biology. MIT Press. 2002.
- Golub T.R, et al
Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring.
Science, 286. pp. 531-537. 1999.
- Eisen MB, Spellman PT, Brown PO, Botstein D.
Cluster Analysis and Display of Genome-Wide Expression Patterns. PNAS, Vol. 95. pp. 14863-14868. 1998.
- Horton P. and Nakai K.
Better Prediction of Protein Cellular Localization Sites with the K Nearest Neighbors Classifier. ISMB, 1997.
- Hoeglund A. et al.
MultiLoc: Prediction of Protein Subcellular Localization using N-terminal
Sequences, Sequence Motifs and Amino Acid Composition .
Bioinformatics, 22(10), pp. 1158-1165, 2006.
- N. Friedman et al.
Using Bayesian Networks to Analyze Expression Data.
Journal of Computational Biology, 7(3/4), pp. 601-620. 2000.
- Lee P.H. and Shatkay H.
BNTagger: Improved Tagging SNP Selection Using Bayesian Networks
- N. Friedman
Inferring Cellular Networks Using Probabilistic Graphical Models.
Science, 303, pp. 799-805. 2004.
- de Bruijn B, Martin J.
Getting to the (C)ore of Knowledge: Mining Biomedical Literature.
Int. Journal of Medical Informatics 67, pp. 7-18. 2002.
- Shatkay H.
Hairpins in Bookstacks: Information Retrieval from Biomedical Text
Briefings in Bioinformatics, 6(3), pp. 222-238. 2005.
- Shatkay H, Edwards S, Wilbur WJ, Boguski M.
Genes, Themes and Microarrays: Using Information Retrieval for Large Scale Gene Analysis. ISMB. 2000.
Supplementary recommended books reserved in the Engineeering and Science Library:
- An Introduction to Bioinformatics Algorithms.
N.C. Jones and P.A. Pevzner. 2004. [QH324.2.J66].
- Bioinformatics: The Machine Learning Approach.
P. Baldi and S. Brunak. 2nd Ed. 2001.
- Molecular Biology of the Cell. B. Alberts et al. 4th Ed. 2002.
- Machine Learning. T.M. Mitchell. 1997. [Q325.5 M58].
- Pattern Classification and Scene Analysis.
R.O. Duda and P.E. Hart. 1973.[Q327 .D83]. (Chapter 6 in particular).
- Artificial Intelligence: A Modern Approach.
S. Russell and P. Norvig. 2nd Ed. 2003. [Q335.R86]. (Chapters 18-20).
Web sites for some biological background:
- Introduction to Molecular Biology for Computer Scientists, by L. Hunter.
- Introduction to protein synthesis, by M.W. King.
- Introduction to gene expression and regulation, by M.W. King.
- A Quick Introduction to Elements of Biology - Cells, Molecules, Genes, Functional Genomics, Microarrrays , by A. Brazma, H. Parkinson, T. Schlitt and M. Shojatalab.
Last update: March 5, 2007