Extract Biological Knowledge Research Articles

Genome-wide association studies (GWAS) have successfully discovered a number of disease-associated genetic variants in the past decade, providing an unprecedented opportunity for deciphering genetic basis of human inherited diseases. However, it is still a challenging task to extract biological knowledge from the GWAS data, due to such issues as missing heritability and weak interpretability. Indeed, the fact that the majority of discovered loci fall into noncoding regions without clear links to genes has been preventing the characterization of their functions and appealing for a sophisticated approach to bridge genetic and genomic studies. Towards this problem, network-based prioritization of candidate genes, which performs integrated analysis of gene networks with GWAS data, has emerged as a promising direction and attracted much attention. However, most existing methods overlook the sparse and noisy properties of gene networks and thus may lead to suboptimal performance. Motivated by this understanding, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize candidate genes for complex diseases. We leveraged a technique called the network representation learning to embed a gene network into a compact and robust feature space, and then designed a hierarchical statistical model to integrate features of multiple gene networks with GWAS data for the effective inference of genes associated with a disease of interest. We applied our method to six complex diseases and demonstrated the superior performance of REGENT over existing approaches in recovering known disease-associated genes. We further conducted a pathway analysis and showed that the ability of REGENT to discover disease-associated pathways. We expect to see applications of our method to a broad spectrum of diseases for post-GWAS analysis. REGENT is freely available at https://github.com/wmmthu/REGENT.

Read full abstract

The conversion of pentose to ethanol is one of the major barriers of industrializing the lignocellulosic ethanol processes. As one of the most promising native strains for pentose fermentation, Scheffersomyces stipitis (formerly known as Pichia stipitis) has been widely studied for its xylose fermentation. In spite of the abundant experimental evidence regarding ethanol and byproducts production under various aeration conditions, the mathematical descriptions of the processes are rare. In this work, a constraint-based metabolic network model for the central carbon metabolism of S. stipitis was reconstructed by integrating genomic (S. stipitis v2.0, KEGG), biochemical (ChEBI, PubChem), and physiological information available for this microorganism and other related yeast. The model consists of the stoichiometry of metabolic reactions, biosynthetic requirements for growth, and other constraints. Flux balance analysis is applied to characterize the phenotypic behavior of S. stipitis grown on xylose. The model predictions are in good agreement with published experimental results. To understand the effect of redox balance on xylose fermentation, we propose a system identification-based metabolic analysis framework to extract biological knowledge embedded in a series of designed in silico experiments. In the proposed framework, we first design in silico experiments to perturb the metabolic network in order to investigate the interested properties and then perform system identification, whereby applying principal component analysis (PCA) to the data generated by the designed in silico experiments. By combining the in silico perturbation experiments with system identification tools, biologically meaningful information contained in the complex network structure can be decomposed and translated into easily interpretable information that is useful for biologist. The PCA analysis identifies the phenotypic changes caused by oxygen supply and reveals key metabolic reactions related to redox homeostasis in different phenotypes. In addition, the influence of the cofactor preference of key enzyme (xylose reductase) in xylose metabolism is investigated using the proposed approach, and the results provide important insights on cofactor engineering of xylose metabolism.

Read full abstract

Extract Biological Knowledge Research Articles

Articles published on Extract Biological Knowledge

A FAIR and modular image‐based workflow for knowledge discovery in the emerging field of imageomics

SHARE-Topic: Bayesian interpretable modeling of single-cell multi-omic data

Dex-Benchmark: datasets and code to evaluate algorithms for transcriptomics data analysis.

A Novel Calibration Step in Gene Co-Expression Network Construction.

A Novel Fuzzy Frequent Itemsets Mining Approach for the Detection of Breast Cancer

PyHIST: A Histological Image Segmentation Tool.

Detection of Breast Cancer Based on Fuzzy Frequent Itemsets Mining

Extraction of Biological Knowledge by Clustering Data Mining Techniques

Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning.

Understanding Human Autoimmunity and Autoinflammation Through Transcriptomics.

Editorial: Protein Interaction Networks in Health and Disease.

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery.

Elucidating Xylose Metabolism of Scheffersomyces stipitis for Lignocellulosic Ethanol Production

PARSEC: PAtteRn SEarch and Contextualization

O-miner: an integrative platform for automated analysis and mining of -omics data

Finding a Needle in the Virus Metagenome Haystack - Micro-Metagenome Analysis Captures a Snapshot of the Diversity of a Bacteriophage Armoire

Efficient Computational Construction of Weighted Protein-Protein Interaction Networks Using Adaptive Filtering Techniques Combined with Natural Selection-Based Heuristic Algorithms

Epigenetic Control of Reprogramming and Cellular Differentiation

A Data-Driven Integrative Model of Sepal Primordium Polarity in Arabidopsis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Extract Biological Knowledge Research Articles

Articles published on Extract Biological Knowledge

A FAIR and modular image‐based workflow for knowledge discovery in the emerging field of imageomics

SHARE-Topic: Bayesian interpretable modeling of single-cell multi-omic data

Dex-Benchmark: datasets and code to evaluate algorithms for transcriptomics data analysis.

A Novel Calibration Step in Gene Co-Expression Network Construction.

A Novel Fuzzy Frequent Itemsets Mining Approach for the Detection of Breast Cancer

PyHIST: A Histological Image Segmentation Tool.

Detection of Breast Cancer Based on Fuzzy Frequent Itemsets Mining

Extraction of Biological Knowledge by Clustering Data Mining Techniques

Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning.

Understanding Human Autoimmunity and Autoinflammation Through Transcriptomics.

Editorial: Protein Interaction Networks in Health and Disease.

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery.

Elucidating Xylose Metabolism of Scheffersomyces stipitis for Lignocellulosic Ethanol Production

PARSEC: PAtteRn SEarch and Contextualization

O-miner: an integrative platform for automated analysis and mining of -omics data

Finding a Needle in the Virus Metagenome Haystack - Micro-Metagenome Analysis Captures a Snapshot of the Diversity of a Bacteriophage Armoire

Efficient Computational Construction of Weighted Protein-Protein Interaction Networks Using Adaptive Filtering Techniques Combined with Natural Selection-Based Heuristic Algorithms

Epigenetic Control of Reprogramming and Cellular Differentiation

A Data-Driven Integrative Model of Sepal Primordium Polarity in Arabidopsis