CISC873 Final Exam 2004

The final exam will be available at 5 p.m. on Wednesday December 12th, and is due by 5 p.m. on Friday December 14th. Solutions are to be returned to my office, either physically or electronically (as a pdf document). I expect the exam to take about half a day.

The answers to the exam can have a maximum length of 6 pages at 11pt or 7 pages at 12pt, including figures and references. As there won't be room to include everything you might want to say, you will have to be selective and to use references to anything that is standard, rather than including text about it.

The exam will be in the form of a data mining problem. I expect you to do a little actual mining, but what I'm mostly looking for is a coherent explanation of how you would attack the specified problem, and what you would expect to find at each stage. You might want to use data mining tools to validate your assumptions and to check that your reasoning does indeed seem to agree with the models you generate.

Above all, quality not quantity.

Please attach a page with the following content:

I certify that the submitted work was done by me without help from any other person.
and sign it.

The dataset contains data about 500 objects, with 35 attributes, and labels for three classes, given by attribute 36. You are to build the best predictor for this data that you can. There is some uncertainty about the class labels, however, so you may want to explore whether classes have been mislabelled, or whether there are plausibly more than three classes in this data.