Evaluation of clustering algorithms for gene expression data using gene ontology annotations.
- Author:
Ning MA
1
;
Zheng-Guo ZHANG
Author Information
- Publication Type:Journal Article
- MeSH: Algorithms; Cluster Analysis; Gene Expression Profiling; Humans; Molecular Sequence Annotation
- From: Chinese Medical Journal 2012;125(17):3048-3052
- CountryChina
- Language:English
-
Abstract:
BACKGROUNDClustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes. Biologists frequently face the problem of choosing an appropriate algorithm. We aimed to provide a standalone, easily accessible and biologically oriented criterion for expression data clustering evaluation.
METHODSAn external criterion utilizing annotation based similarities between genes is proposed in this work. Gene ontology information is employed as the annotation source. Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.
RESULTSThe rank of these algorithms given by the criterion coincides with our common knowledge. Single-linkage has significantly poorer performance, even worse than the random algorithm. Ward's method archives the best performance in most cases.
CONCLUSIONSThe criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements. It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters. As an addition, we suggest using Ward's algorithm for gene expression data analysis.