Large-scale Metabolomics Data Research Articles

One central goal of systems biology is to infer biochemical regulations from large-scale OMICS data. Many aspects of cellular physiology and organismal phenotypes can be understood as results of metabolic interaction network dynamics. Previously, we have proposed a convenient mathematical method, which addresses this problem using metabolomics data for the inverse calculation of biochemical Jacobian matrices revealing regulatory checkpoints of biochemical regulations. The proposed algorithms for this inference are limited by two issues: they rely on structural network information that needs to be assembled manually, and they are numerically unstable due to ill-conditioned regression problems for large-scale metabolic networks. To address these problems, we developed a novel regression loss-based inverse Jacobian algorithm, combining metabolomics COVariance and genome-scale metabolic RECONstruction, which allows for a fully automated, algorithmic implementation of the COVRECON workflow. It consists of two parts: (i) Sim-Network and (ii) inverse differential Jacobian evaluation. Sim-Network automatically generates an organism-specific enzyme and reaction dataset from Bigg and KEGG databases, which is then used to reconstruct the Jacobian's structure for a specific metabolomics dataset. Instead of directly solving a regression problem as in the previous workflow, the new inverse differential Jacobian is based on a substantially more robust approach and rates the biochemical interactions according to their relevance from large-scale metabolomics data. The approach is illustrated by in silico stochastic analysis with differently sized metabolic networks from the BioModels database and applied to a real-world example. The characteristics of the COVRECON implementation are that (i) it automatically reconstructs a data-driven superpathway model; (ii) more general network structures can be investigated, and (iii) the new inverse algorithm improves stability, decreases computation time, and extends to large-scale models. The code is available in the website https://bitbucket.org/mosys-univie/covrecon.

Read full abstract

Transcriptomics and metabolomics data often contain missing values or outliers due to limitations of the data acquisition techniques. Most of the statistical methods require complete datasets for downstream analysis. A number of methods have been developed for missing value imputation using the classical mean and variance based on maximum likelihood estimators, which are not robust against outliers. Consequently, the performance of these methods deteriorates in the presence of outliers. Hence precise imputation of missing values and outliers handling are both concurrently important. Therefore, in this paper, we developed a robust iterative approach using robust estimators based on the minimum beta divergence method, which simultaneously impute missing values and outliers. We investigate the performance of the proposed method in a comparison with six frequently used missing value imputation methods such as Zero, KNN, robust SVD, EM, random forest (RF) and weighted least square approach (WLSA) through feature selection using both simulated and real datasets. Ten performance indices were used to explore the optimal method such as Frobenius norm (FOBN), accuracy (ACC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), detection rate (DR), misclassification error rate (MER), the area under the ROC curve (AUC) and computational runtime. Evaluation based on both simulated and real data suggests the superiority of the proposed method over the other traditional methods in terms of various rates of outliers and missing values. The suggested approach also keeps almost equal performance in absence of outliers with the other methods. The proposed method is accurate, simple, and consumes lower computational time compared to the other methods. Therefore, our recommendation is to apply the proposed procedure for large-scale transcriptomics and metabolomics data analysis. The computational tool has been implemented in an R package, which is publicly available from https://CRAN.R-project.org/package=rMisbeta.

Read full abstract

Large-scale Metabolomics Data Research Articles

Related Topics

Articles published on Large-scale Metabolomics Data

MetMiner: A user-friendly pipeline for large-scale plant metabolomics data analysis.

MetaboLink: A web application for Streamlined Processing and Analysis of Large-Scale Untargeted Metabolomics Data.

The underappreciated diversity of bile acid modifications

COVRECON: automated integration of genome- and metabolome-scale network reconstruction and data-driven inverse modeling of metabolic interaction networks.

Distinct metabolic features of genetic liability to type 2 diabetes and coronary artery disease: a reverse Mendelian randomization study.

Norm ISWSVR: A Data Integration and Normalization Approach for Large-Scale Metabolomics.

RMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data

Metabolomics-Based Screening of Inborn Errors of Metabolism: Enhancing Clinical Application with a Robust Computational Pipeline.

A hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Reference Standardization for Quantification and Harmonization of Large-Scale Metabolomics.

A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data.

Statistical Workflow for Feature Selection in Human Metabolomics Data.

WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis

Metabolomics in epidemiologic research: challenges and opportunities for early-career epidemiologists.

Metabolic network segmentation: A probabilistic graphical modeling approach to identify the sites and sequential order of metabolic regulation from non-targeted metabolomics data.

Potential Impact and Study Considerations of Metabolomics in Cardiovascular Health and Disease: A Scientific Statement From the American Heart Association.

Sparse network modeling and metscape-based visualization methods for the analysis of large-scale metabolomics data.

Normalization and integration of large-scale metabolomics data using support vector regression

Embracing new-generation 'omics' tools to improve drought tolerance in cereal and food-legume crops

Bayesian Independent Component Analysis Recovers Pathway Signatures from Blood Metabolomics Data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large-scale Metabolomics Data Research Articles

Related Topics

Articles published on Large-scale Metabolomics Data

MetMiner: A user-friendly pipeline for large-scale plant metabolomics data analysis.

MetaboLink: A web application for Streamlined Processing and Analysis of Large-Scale Untargeted Metabolomics Data.

The underappreciated diversity of bile acid modifications

COVRECON: automated integration of genome- and metabolome-scale network reconstruction and data-driven inverse modeling of metabolic interaction networks.

Distinct metabolic features of genetic liability to type 2 diabetes and coronary artery disease: a reverse Mendelian randomization study.

Norm ISWSVR: A Data Integration and Normalization Approach for Large-Scale Metabolomics.

RMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data

Metabolomics-Based Screening of Inborn Errors of Metabolism: Enhancing Clinical Application with a Robust Computational Pipeline.

A hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Reference Standardization for Quantification and Harmonization of Large-Scale Metabolomics.

A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data.

Statistical Workflow for Feature Selection in Human Metabolomics Data.

WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis

Metabolomics in epidemiologic research: challenges and opportunities for early-career epidemiologists.

Metabolic network segmentation: A probabilistic graphical modeling approach to identify the sites and sequential order of metabolic regulation from non-targeted metabolomics data.

Potential Impact and Study Considerations of Metabolomics in Cardiovascular Health and Disease: A Scientific Statement From the American Heart Association.

Sparse network modeling and metscape-based visualization methods for the analysis of large-scale metabolomics data.

Normalization and integration of large-scale metabolomics data using support vector regression

Embracing new-generation 'omics' tools to improve drought tolerance in cereal and food-legume crops

Bayesian Independent Component Analysis Recovers Pathway Signatures from Blood Metabolomics Data