Abstract

Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven successful in integrating diverse biological data for the representation of the hierarchy of biological system. The HMLN provides unparalleled opportunities but imposes new computational challenges on establishing causal genotype-phenotype associations and understanding environmental impact on organisms. In this review, we focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We first discuss the properties of biological HMLN. Then we survey four categories of state-of-the-art methods (matrix factorization, random walk, knowledge graph, and deep learning). Thirdly, we demonstrate their applications to omics data integration and analysis. Finally, we outline strategies for future directions in the development of new HMLN models.

Highlights

  • A fundamental task in biological studies is to identify relations, dynamic functional associations or physical interactions between various chemical and biological entities

  • We focus on the cross-domain relation inference problem for the heterogeneous multi-layered network (HMLN)

  • We summarize the recent advances in the development of cross-layer relation inference algorithms for the HMLN, and their applications to biological discovery

Read more

Summary

Frontiers in Genetics

Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. We focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We survey four categories of state-ofthe-art methods (matrix factorization, random walk, knowledge graph, and deep learning). We demonstrate their applications to omics data integration and analysis.

INTRODUCTION
CHARACTERISTICS OF BIOLOGICAL HMLN
Sparsity and Imbalance
ALGORITHMS FOR RELATION INFERENCE IN HMLN
Matrix Factorization
Node homophily
Random Walk
Graph Neural Network and Other Deep Learning Techniques
Application of HMLN in Omics Data Integration and Analysis
Representation of Biological Hierarchy and Environment
Data Consolidation and Normalization
Inference of Directionality and Trend of Relations
Incorporation of Ontology
Sampling of Negative Relations
Visualizing HMLN
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call