Abstract

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call