Hierarchical Clustering in Matlab


The three steps to cluster in Matlab

First you calculate a distance matrix (named Y) using pdist:

	Y = pdist(data, 'euclid');

The rows of your data matrix should be the items you want clustered, and the columns are the features. You can choose different distance measures – I used euclidian distance in the example. Type help pdist for more information and a list of the different distance measures.


Next you calculate your linkage matrix (which we call Z) using linkage:

	Z = linkage(Y, 'complete');

Y is your distance matrix from before. There are more linkage options than complete – you can find them in help linkage.


Finally, to get a labeling of your rows (named T), use cluster on your linkage matrix:

	T = cluster(Z, k);

k is how many clusters you'd like. It does something else if it is less than 2 (which I never use) – take a look at the help if you're curious.


Visualization

If you want to see a dendrogram of your clusters, try:

	dendrogram(Z, k);

k is how many clusters you want to see in the dendrogram. Setting k = 0 shows all of them.


The function I used to get those images of data set 2 is:

	imagesc(matrix);  

It takes the values of matrix and scales them so that you get an image with the colours from blue to red.



Home