Machine learning methods for metabolic pathway prediction

Joseph M Dale,Peter D Karp,Liviu Popescu

doi:10.1186/1471-2105-11-15

Abstract

BackgroundA key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism.ResultsTo quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways.ConclusionsML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

Highlights

A key challenge in systems biology is the reconstruction of an organism’s metabolic network from its genome sequence
The Boolean and discretized numeric features are ranked according to the information gain, and the numeric features according to the area under the receiver operating characteristic (ROC) curve (AUC); these measures are described in the section “Performance Evaluation”
Note that the results presented here do not show a full picture of the performance of the machine learning (ML) methods, which provide a tradeoff between sensitivity and specificity by virtue of providing estimates of the probabilities of pathways being present in an organism, rather than binary present/absent calls

Summary

Introduction

A key challenge in systems biology is the reconstruction of an organism’s metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. (2) The pathway prediction problem: Given the reactome of an organism and its annotated genome, predict the set of metabolic pathways present in the organism. Pathway prediction can involve predicting pathways that were previously known in other organisms, or predicting novel pathways that have not been previously observed (pathway discovery). Our methodology does the former, predicting pathways from a curated reference database

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 8, 2010
Citations: 178	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Machine learning methods for metabolic pathway prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Genomic prediction in plants: opportunities for ensemble machine learning based approaches.
Muhammad Farooq ... Aalt D.J Van Dijk
F1000Research | VOL. 11
Muhammad Farooq, et. al.Muhammad Farooq ... Aalt D.J Van Dijk
10 Jan 2023
F1000Research | VOL. 11

Genomic prediction in plants: opportunities for ensemble machine learning based approaches.
Muhammad Farooq ... Shahid Mansoor
F1000Research | VOL. 11
Muhammad Farooq, et. al.Muhammad Farooq ... Shahid Mansoor
18 Jul 2022
F1000Research | VOL. 11

Machine learning methods for propensity and disease risk score estimation in high-dimensional data: a plasmode simulation and real-world data cohort analysis.
Yuchen Guo ... Daniel Prieto-Alhambra
Frontiers in pharmacology | VOL. 15
Yuchen Guo, et. al.Yuchen Guo ... Daniel Prieto-Alhambra
28 Oct 2024
Frontiers in pharmacology | VOL. 15

Applications of machine learning in friction stir welding: Prediction of joint properties, real-time control and tool failure diagnosis
Ammar H Elsheikh
Engineering Applications of Artificial Intelligence | VOL. 121
Ammar H ElsheikhAmmar H Elsheikh
14 Feb 2023
Engineering Applications of Artificial Intelligence | VOL. 121

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning methods for metabolic pathway prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics