Abstract
BackgroundOmics profiling is now a routine component of biomedical studies. In the analysis of omics data, clustering is an essential step and serves multiple purposes including for example revealing the unknown functionalities of omics units, assisting dimension reduction in outcome model building, and others. In the most recent omics studies, a prominent trend is to conduct multilayer profiling, which collects multiple types of genetic, genomic, epigenetic and other measurements on the same subjects. In the literature, clustering methods tailored to multilayer omics data are still limited. Directly applying the existing clustering methods to multilayer omics data and clustering each layer first and then combing across layers are both “suboptimal” in that they do not accommodate the interconnections within layers and across layers in an informative way.MethodsIn this study, we develop the MuNCut (Multilayer NCut) clustering approach. It is tailored to multilayer omics data and sufficiently accounts for both across- and within-layer connections. It is based on the novel NCut technique and also takes advantages of regularized sparse estimation. It has an intuitive formulation and is computationally very feasible. To facilitate implementation, we develop the function muncut in the R package NcutYX.ResultsUnder a wide spectrum of simulation settings, it outperforms competitors. The analysis of TCGA (The Cancer Genome Atlas) data on breast cancer and cervical cancer shows that MuNCut generates biologically meaningful results which differ from those using the alternatives.ConclusionsWe propose a more effective clustering analysis of multiple omics data. It provides a new venue for jointly analyzing genetic, genomic, epigenetic and other measurements.
Highlights
Omics profiling is a routine component of biomedical studies
The bottom layer consists of Copy number variation (CNV), the middle layer consists of Gene expression (GE), and the upper layer consists of proteins
A small number of CNVs in the lower layer regulate a small number of GEs in the middle layer, which encode a small number of proteins in the upper layer
Summary
Omics profiling is a routine component of biomedical studies. In the analysis of omics data, clustering is an essential step and serves multiple purposes including for example revealing the unknown functionalities of omics units, assisting dimension reduction in outcome model building, and others. Applying the existing clustering methods to multilayer omics data and clustering each layer first and combing across layers are both “suboptimal” in that they do not accommodate the interconnections within layers and across layers in an informative way. Clustering results can be used in multiple ways They can suggest the unknown functionalities of omics units, with those in the same clusters likely to have related biological functions [1].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.