# CISC 859 Pattern Recognition, Winter 2019

Instructor: Dorothea Blostein, 720 Goodwin Hall, 533-6537, blostein@cs.queensu.ca

Lectures: Mondays and Wednesdays 8:30-9:45AM, Goodwin 521

Office hours: Mondays and Wednesdays 10:00-11:00AM, Goodwin 720

## Exam

The exam is 1:00-4:00 on Friday April 12 in Goodwin 247.

A review session will be scheduled on April 8 or 9 or 10.

The exam tests your mastery of material covered in the assignments. It will include major questions on the following two topics:

• The Bayes' classifier, a question similar to Assig 1 problem 8 or Assig 2 problem 1. Also make sure you understand Assig 2 problem 2: that problem is a bit more complex than what I would put on an exam but you should understand how to use Bayes classifier in a situation like this where one density is uniform and the other one is normal.
• The Anderson math grammar, a question similar to Assig 4 problem 5 (but with a simpler expression and smaller parse tree)
The remainder of the exam will be shorter questions on various topics, similar to the following assignment problems. (Starred problems were mentioned above.)
```Assignment 1: problems 1 2 3 7 *8*
Assignment 2: problems *1* *2* 3 4
Assignment 3: problems 1b 1c 2 3a 3b 4 5 6
Assignment 4: problem [1] 2 *5* 6
```
I wrote brackets around Assig 4 problem 1 because I do not expect you to memorize the sum-of-squared-error criterion or the answer to this question; you should be prepared to answer general questions about clustering. Omitted from the list above are problems that are too detailed or too time-consuming for an exam.

The exam is done without aids: closed-book and no calculators. Doing this allows me to ask simple questions such as "Describe what curse of dimensionality means" or "Give an example of training on the test data: describe a series of training&testing steps where this problem occurs". I will provide a copy of the Anderson Math Grammar with the exam, along with the derivation sequence from "expression" to "letter" (top left of course reader page 88).
I realize that being without a calculator might make you prone to simple math errors. I will not deduct marks for obvious errors like 2*3=5, provided that you are clearly showing your work so I can see that you are correct in how you are attempting to find the answer.

Duda, Hart, and Stork, Pattern Classification, Second Edition, Wiley 2001.
This book is well established as the standard reference book in pattern recognition. Hardcopy is available from Queen’s bookstore and the publisher offers an e-book version.

Blostein, CISC859 Course Reader, 2019. For sale at Queen’s bookstore. The bookstore does not automatically print more copies if their stock runs out – you must make a request for them to print another copy for you.

## Course description and prerequisites

As described in the course overview, this course covers statistical and structural pattern recognition. The course material is relevant to many areas of research, including data mining, artificial intelligence, computer vision and signal processing. CISC859 has been taken by graduate students from computing, electrical engineering, mathematics, mechanical engineering, geology, chemistry, and engineering physics.

Familiarity with the following subjects is expected. Students missing this background have successfully taken the course by doing extra reading.

• Elementary calculus: Integrals, and how they relate to the area under a curve.
• Elementary probability theory: Probability distribution, probability density, random variables. I provide review.
• Elementary formal languages: Context free grammars, and how they define a language. I provide review.
• Programming: For the course project, you are expected to implement a classifier. Toolboxes such as Weka may be used.

## Marking Scheme; Information about Assignments, Presentation, Project

The CISC859 marking scheme is described in detail below. As an overview, 84% of the course mark is based on evaluated work and 16% is based on participation. The evaluated work is marked using Queen's 0 to 100 scale with this conversion: exam 25%, oral presentation 15%, written report 15%, digit recognizer work 14%, digit recognizer report 15%. I don't mark the following participation components in detail, instead using A 87 as the default mark for good effort: assignments 8%, plan for oral presentation 3%, fill out feedback forms for student presentations 4%, digit recognizer part1 1%. The minimum passing mark for a graduate course is 70.

33% Assignments and exam

• 8% Four assignments, posted in the next section of this website. Assignments may be completed individually, or in groups of two or three students. The assignment mark is based on effort: I quickly assess assignment completion rather than marking in detail. Please see me in office hours if you want more detailed feedback on your assignment answers.
• 25% Exam. Questions are based on the assignments, with one major question about the Bayes classifier and another major question about Anderson's grammar for recognition of math notation.
37% Study of a pattern recognition topic, with oral presentation and written report. Here is a list of suggested topics; or choose your own topic.
• 3% One-page written plan for your oral presentation, due one week before your presentation. State the topic, the reference(s) you are using, the main ideas you want to convey to the audience, the background you are assuming audience members have, and a brief outline for your talk. The "main ideas you want to convey to the audience" are the underlying themes of your talk, the take-home messages that you want to present to audience members in such a convincing and memorable way that they will remember these points even a year later. Your entire presentation is planned around the goal of conveying these main ideas to the audience. You can state the main ideas at the start of your talk, or introduce them gradually during the talk; in any case, summarize the main ideas at the end of your presentation.
• 15% Oral presentation to the class. This is my evaluation of your success in presenting according to the plan you submitted: did you convey the main points in a way that is understandable to an audience with the background you assumed in the plan?
• 15% Written report, due the same day as your presentation. Here is a description of criteria I use in marking oral presentations and written reports.
Required format for this report: 2-4 pages of text (not counting figures and references) that succinctly present the main points. Please don't use tiny font to cram more text onto those pages: I find it easiest to read if font size is 12 point and line spacing is at least 15 points. If you wish, you can optionally include appendices to provide more detailed information. In my marking I will concentrate on your 2-4 pages of text, and will only read the appendices if your writing makes me eager to do so. I impose this strict page limit to give you practice in the vital skill of writing concise documents that convey the main ideas in an informative, convincing and engaging way. See my advice about technical writing.
• 4% Participation during presentations by other students: fill out a feedback form for each presenter. You don't have to fill out the form on the day you present.
30% Digit recognizer project. Here is a project description and here are sample programs for doing image I/O in C and Java, as well as digit images for classifier training and testing.
• 1% On-time submission of Digit Recognizer Part 1. This is marked pass fail (100% or 0%).
• 14% Work done for the project, as described in the final report.
• 15% Quality of the final report.
Required format for this report: 2-4 pages of text (not counting figures and references) that succinctly present the main points. Please don't use tiny font to cram more text onto those pages: I find it easiest to read if font size is 12 point and line spacing is at least 15 points. If you wish, you can optionally include appendices to provide more detailed information. In my marking I will concentrate on your 2-4 pages of text, and will only read the appendices if your writing makes me eager to do so. I impose this strict page limit to give you practice in the vital skill of writing concise documents that convey the main ideas in an informative, convincing and engaging way. See my advice about technical writing.

## Assignments and Schedule

This schedule may be adjusted slightly during the term.
• January 16 Assignment 1 is due. Please hand in hardcopy at the lecture; either handwritten or typed answers are fine.

• January 28 Assignment 2 is due
Here is a website for evaluating the Normal density, if you want to use that for problem 2b to obtain a numerical value for the probability of error when p(x | ω) is normally distributed. Alternatively, you can leave your answer for P(error) in the form of an integral.

• January 30, or earlier: Email me a description of the pattern recognition topic you choose for your oral presentation and written report. Your email should include a short description of your topic (one or two sentences is enough) as well as a bibliography with one or two references you plan to use.

• February 11 Assignment 3 is due.

• February 25 Digit recognizer part 1 is due. Please hand in hardcopy at the lecture, showing that you are able to measure features. Part 1 is marked pass/fail. Repeating the information from above about the digit recognizer project: Here is a project description and here are sample programs for doing image I/O in C and Java, as well as digit images for classifier training and testing.

• March 11 Assignment 4 is due.

• March 4 to March 27: Student presentations. A detailed schedule is distributed by email. Your one-page presentation plan is due one week before your presentation (email or hardcopy is fine). Your written report (hardcopy) is due the same day as your presentation.
Order of presentation is alphabetical by last name. If you can find a student who wants to switch times with you, that's fine with me. There are two presentations per class meeting: 35 minutes for each student and 5 minutes to transition between students. Your presentation should be about 25 minutes long, leaving 10 minutes for questions.

• April 3 Digit recognizer part 2 is due. Hand in hardcopy at lecture. We will have informal in-class discussion about your experiences with the digit recognition project.

• Final exam date will be decided by in-class discussion. Likely dates are April 11, 12, 15, 16.

## Pattern Recognition Resources

### International Association for Pattern Recognition

IAPR is the International Association for Pattern Recognition. IAPR maintains this list of internship positions in pattern recognition for undergraduate and graduate students.
The IAPR education committee provides researcher/student resources for three areas of core technology (symbolic PR; statistical PR; machine learning) and two broad families of application areas (1D signal analysis; 2D image analysis). For each area, they provide links to tutorials and surveys; explanations; online demos; datasets; books; code.
I also recommend taking at look at the information provided by the Technical Committees of the IAPR including:
• TC 1 Statistical Pattern Recognition Techniques.
• TC 2 Structural and Syntactical Pattern Recognition.
• TC 3 Neural Networks and Computational Intelligence
• TC 7 Remote Sensing and Mapping.
• TC 10 Graphics Recognition. Analysis of engineering drawings, maps, tables, forms, drawings, math notation, music notation, etc.
• TC 11 Reading Systems. OCR (Optical Character Recognition), document image processing, pen-based computing, signature verification.
• TC 15 Graph Based Representations in Pattern Recognition and Image Analysis.
• TC 20 Pattern Recognition for Bioinformatics

### Software Environments and Tools

Toolbox by Duda, Hart, Stork written in MATLAB. Read this Introduction to the DHS toolbox by graduate student Nawei Chen and me; it illustrates some of the basic pattern recognition ideas we discuss in class. You are not required to buy or use this toolbox. You can use other software tools such as the ones listed below.

R is a free software environment for statistical computing and graphics. A past CISC 859 student writes: "R implements a lot of the concepts that you discuss in the course, and provides many built-in datasets (like the IRIS dataset and a car manufacturer dataset), so it's a great way to get some fast and easy experience with classifiers from a practical perspective. Coupled with the more theoretical perspective provided by the lectures, I found it to be great one-two punch."

Weka. This Weka tutorial is recommended by a former CISC859 student.

OpenCV (Open Source Computer Vision Library) has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android.

Mirage is a publicly available Java-based tool for exploratory analysis and visualization of large data sets written by Tin Kam Ho at Bell Labs; she now works at IBM Watson. Tin is one of the world's top researchers in statistical pattern recognition.

Comparisons of these tools:

### Computer Vision Resources

CVonline, a compendium of computer vision. Covers many topics, such as Hidden Markov Models (HMMs).
Supplemental information with CVonline: online and hardcopy books, datasets for research and student projects, software packages

Video lectures for an introductory course on computer vision. Topics include flat part recognition, deformable part recognition, range data and stereo data 3D part recognition, detecting & tracking objects in video,and behaviour recognition

### Other Resources

An extensive list of links for pattern recognition and statistics.

Document Layout Interpretation and its Application lists research groups, conferences, data sets, software, and bibliographies.