On the Effectiveness of Constraints Sets in Clustering Genes

Erliang Zeng,Chengyong Yang,Tao Li,Giri Narasimhan

doi:10.1109/bibe.2007.4375548

Abstract

In this paper, we have modified a constrained clustering algorithm to perform exploratory analysis on gene expression data using prior knowledge presented in the form of constraints. We have also studied the effectiveness of various constraints sets. To address the problem of automatically generating constraints from biological text literature, we considered two methods (cluster-based and similarity-based). We concluded that incomplete information in the form of constraints set should be generated carefully, in order to outperform the standard clustering algorithm, which works on the data source without any constraints. For sufficiently large constraints sets, the constrained clustering algorithm outperformed the MSC algorithm. The novelty of research presented here is the study of effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of biological data, and incorporating biomedical text literature into constrained clustering algorithm in form of constraints sets.

Full Text