Abstract

BackgroundIn recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets.ResultsIn this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same.ConclusionParticularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.

Highlights

  • One of the challenges of cancer treatment is how to identify tumor subtypes, which can help to provide patients with specific treatment

  • We propose a Multi-view Clustering based on Stiefel Manifold (MCSM) method for multi-view clustering problems with potential clusters

  • We use the simulated datasets to verify that MVSM method is suitable for datasets with uneven distribution of underlying clusters

Read more

Summary

Introduction

One of the challenges of cancer treatment is how to identify tumor subtypes, which can help to provide patients with specific treatment. Integration of different types of omics data to unravel the molecular mechanism of complex diseases becomes more and more important [2]. Multiple omics data of different subtypes of cancer provided more detailed information. Different levels of multiple omics data often show different types, they have. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. The primary challenges in omics data analysis come from high dimension of data and small size of sample. It is difficult to find a suitable integration method for structural analysis of multiple datasets

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call