Abstract
The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning—an emerging machine learning field—and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data’s heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data—specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.
Highlights
Hierarchical complexity is the nature of all biological phenomena and processes
Multiview learning has a long history [4], and many literature reviews have been produced on this topic, including the following: Li and colleagues [83] focus on multiview representation learning methods; Zhao and colleagues [84], Sun [85, 86], Sun and colleagues [87] focus on some theoretical aspects—that is, generalization bounds—of some old paradigms of multiview learning; one of the first reviews discussing extensively on the consensus and complementary principles of multiview learning is made by Xu and colleagues [16]; Chao and colleagues [88] focus on and categorize multiview clustering methods into generative and discriminative methods; and Baltrusaitis and colleagues [89] conducted a comprehensive survey that categorizes multiview learning methods into 5 technical challenges—representation, translation, alignment, fusion, and co-learning
The applications of multiview learning in biomedical data are just recently investigated [90, 91], and there are surveys investigating the methods to integrate heterogeneous biological and multiomics data [92, 93, 94, 91]. They did not discuss the underlying machine learning principles (e.g., empirical risk minimization (ERM)) for multiview learning and how to use these principles for modeling multiomics data and revealing functional omics. Different from these reviews, we focused on the basic principle underlying all machine learning algorithms (i.e., ERM) and built the alignment-based and factorization-based frameworks for multiview learning based on that principle
Summary
Hierarchical complexity is the nature of all biological phenomena and processes. made of physical entities (e.g., atoms), the phenomena and interaction of biological entities such as DNA and proteins—among others—possess emergent properties that cannot be reduced to or explained by physical laws, which have kept biological sciences more descriptive than predictive for a long time. Through the formal definitions of machine learning identified previously, its applications in biological domains can be regarded as abstracting out a representation f ðXÞ of a single data type X, where X can be, for example, a gene expression profile. This representation captures the interactions of elements (i.e., genes) within X and the phenotypic manifestation Y (e.g., cancer) resulting from those interactions. This co-regularizer serves as a pairwise symmetric alignment function across all different views to coordinate the information among them This multiview framework is based on supervised setting of single-view machine learning in. 2 linear projections, F(1) and F(2), such that the cross correlation across 2 views is maximized: ðFð1Þ ; Fð2Þ Þ 2 arg min Fð1Þ; Fð2Þ
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.