Abstract

Recent advances in high-throughput technologies have given rise to collecting large amounts of multidimensional heterogeneous data that provide diverse information on the same biological samples. Integrative analysis of such multisource datasets may reveal new biological insights into complex biological mechanisms and therefore remains an important research field in systems biology. Most of the modern integrative clustering approaches rely on independent analysis of each dataset and consensus clustering, probabilistic or statistical modeling, while flexible distance-based integrative clustering techniques are sparsely covered. We propose two distance-based integrative clustering frameworks based on bi-level and bi-objective extensions of the p-median problem. A hybrid branch-and-cut method is developed to find global optimal solutions to the bi-level p-median model. As to the bi-objective problem, an -constraint algorithm is proposed to generate an approximation to the Pareto optimal set. Every solution found by any of the frameworks corresponds to an integrative clustering. We present an application of our approaches to integrative analysis of NCI-60 human tumor cell lines characterized by gene expression and drug activity profiles. We demonstrate that the proposed mathematical optimization-based approaches outperform some state-of-the-art and traditional distance-based integrative and non-integrative clustering techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call