A supervised learning approach to the unsupervised clustering of genes

Andrew Rider,Michael Ferdig,Nitesh V Chawla Habtom,Scott Emrich,Geoffrey Siwo

doi:10.1109/bibm.2010.5706585

Abstract

Clustering is a common step in the analysis of microarray data. Microarrays enable simultaneous high-throughput measurement of the expression level of genes. These data can be used to explore relationships between genes and can guide development of drugs and further research. A typical first step in the analysis of these data is to use an agglomerative hierarchical clustering algorithm on the correlation between all gene pairs. While this simple approach has been successful it fails to identify many genetic interactions that may be important for drug design and other important applications. We present an approach to the clustering of expression data that utilizes known gene-gene interaction data to improve results for already commonly used clustering techniques. The approach creates an ensemble similarity measure that can be used as input to common clustering techniques and provides results with increased biological significance while not altering the clustering approach at all.

Full Text