CISC 333 Exercise Sheet 5



Due beginning of class, Friday October 21st
This sheet is expected to take you: 3 hours

Questions:

  1. Use the Support Vector Machine operator (libSVM) to predict the classes. The Galaxy dataset has more than two classes, but the operator automatically uses one-against-the-rest to solve multiclass problems. Start with the 3-class labelling. Experiment with both the C parameter (penalty for points in the block) and the choice of kernel.

    Now repeat with the 4-class labelling. Would it help to merge some of the classes and run a 2-class prediction directly? What about improving prediction of one of the smaller classes by merging all of the others? Try it and see.

  2. You've built three kinds of prediction models on this dataset. Create a table comparing their performance and write a few paragraphs explaining how you would choose a prediction technique for data like this.

  3. Suppose that the data had had 100,000 rows. How would this have changed your decision process? Suppose that the data had only 50 rows but 500 attributes. How would this have changed your decision process?