Algorithms and Machine Learning Methods in Computational Biology, CISC490

Where Artificial Intelligence Meets Real Life...

Winter, 2007

Instructor: Dr. Hagit Shatkay

Office: Goodwin 756; Office hours: Wednesday, 1:00-3:00pm

Meeting times:

TA: Phil Hyoun Lee.
TA hours: Thursday, 5:00-7:00pm, Goodwin 235.


What does dynamic programming have to do with biology?
Why is Bayes rule important for large-scale gene analysis? (and what does it mean to be Bayesian???)
What are Markov models and why is Markov "hidden"? What do these models have to do with genes and proteins?
Why should biologists care about minimal spanning trees or shortest paths in a graph?
What is Machine Learning? How is it used in text mining, in gene expression analysis, in protein sub-cellular localization, in...?

These are some examples of what CISC-490 is about. Throughout the course we shall examine topics in probability, algorithms and machine-learning, and learn how they combine to form current methods in computational biology.

Who should take this course?

The course is intended primarily for 3rd and 4th year biomedical-computing students.
HOWEVER, it is open to 4th year students, from all programs in the School of Computing and in the ECE dept, who are interested in biomedical applications and/or machine learning.
The course is quite challenging, but apparently - to quote students who took it last year - "worth the effort".
If you are indeed interested in how real algorithms, real models and real programs are used to solve real current problems in computational biology, this course is for you.
If you are unsure whether CISC-490 is right for you - please contact the instructor.


* Understanding of basic concepts in probability and statistics is assumed (e.g. STAT-263)
* CISC-235
* CISC-365 highly recommended (can be taken in conjunction with CISC490).
* CISC-352 a plus (can be taken in conjunction with CISC490).
* Background in Molecular Biology is an advantage - but not required.

Course material

As there is no single textbook that covers all the topics, the major source of information will be the material covered in class.
Copies of my slides, (which contain high-level points and illustrations) will be handed out at the beginning of each class.
The detailed explanation will be given throughout the class.
No textbook can substitute for regular class attendance and good class notes.

As for the textbook, the primary book is: Biological Sequence Analysis. R. Durbin, S. Eddy, A. Krogh, G. Mitchison. 1998.
A number of copies are reserved in the Engineering and Science Library. However, it will be used only through part of the course.

Note that we shall neither cover all of the material in the book, nor use it exclusively. Additional material will be distributed and used throughout the course. Some will be given as handouts in class. The rest will be available online (See list below).

Online Reading

Supplementary recommended books reserved in the Engineeering and Science Library:

Web sites for some biological background:

Last update: March 5, 2007