Assorted Matlab Functions


These functions might save you a little time when you're getting your data mine on.

scale(data) scales the values of each column so that they fall between 0 and 1.

standardize(data) takes each column and standardizes it to have a mean of zero and a std of 1.

findKEntryCols(data, k) returns the index of columns with k or less greater-than-zero entries. Wouldn't be hard to modify so that it was k or less non-zero entries.

getNotIn(vector, max) returns a vector the numbers from 1:max that aren't in vector. Useful with findKEntryCols if you wanted the indecies of columns that had more than k entries.

tvalue(x, y) x and y are two vectors of numbers (values of a feature that were labelled positive and negative, perhaps). tvalue returns the tvalue that results from a ttest comparing the means and variance of x and y. Gives you an idea of how predictive the values are of their classes.

crossVal(k, label, data, params) You'll need to rewrite this one if you're using if for something other than libsvm. K is the k-fold number, label is your row labels, data is your data, of course, and params are the parameters of the svm. Just put your model in the place of train and test in the function. It returns a vector of accuracies from each run.

noiseVal(data, label, params) This needs some modification, just like crossVal. The idea is you train on all your samples and then add random noise to each column, one at a time. Testing on the perturbed data gives you an idea of how important each column is to your model's prediction. Assumes your input data has been standardized (you will want to tweak the random noise otherwise) and returns the column numbers and corresponding accuracies.



Home