Abstract

Clustering is a common technique for statistical data analysis and is essential for developing precision medicine. Numerous computational methods have been proposed for integrating multi-omics data to identify cancer subtypes. However, most existing clustering models based on network fusion fail to preserve the consistency of the distribution of the data before and after fusion. Motivated by this observation, we would like to measure and minimize the distribution difference between networks, which may not be in the same space, to improve the performance of data fusion. We were therefore motivated to develop a flexible clustering model, based on network fusion, that minimizes the distribution difference between the data before and after fusion by co-regularization; the model can be applied to both single- and multi-omics data. We propose a new network fusion model for single- and multi-omics data clustering for identifying cancer or cell subtypes based on co-regularized network fusion (SMCC). SMCC integrates low-rank subspace representation and entropy to fuse networks. In addition, it measures and minimizes the distribution difference between the similarity networks and the fusion network by co-regularization. The model can both reduce the noise interference in the source data and make the statistical characteristics of the fusion result closer to those of the source data. We evaluated the clustering performance of SMCC across 16 real single- and multi-omics dataset. The experimental results demonstrated that SMCC is superior to 17 state-of-the-art clustering methods. Moreover, it is effective for identifying cancer or cell subtypes, thereby promoting the development of precision medicine.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call