Abstract:
Motivation: Many clustering algorithms have been proposed for the
analysis of gene expression data, but little guidance is available
to help choose among them. We provide a systematic framework for
assessing the results of clustering algorithms. Clustering
algorithms attempt to partition the genes into groups exhibiting
similar patterns of variation in expression level. Our methodology
is to apply a clustering algorithm to the data from all but one
experimental condition. The remaining condition is used to assess
the predictive power of the resulting clusters---meaningful clusters
should exhibit less variation in the remaining condition than
clusters formed by chance.
Results: We successfully applied our methodology to compare seven clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.
Download: PostScript PDF (Contains color figures, if you have a color printer.)
E-mail: ruzzo /at/ cs /dot/ washington /dot/ edu