Abstract

BackgroundMicrobiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes and seldom generalize well. Their interpretation is challenging for physicians and biologists, which makes them difficult to trust and use routinely in the physician–patient decision-making process. Novel methods that provide interpretability and biological insight are needed. Here, we introduce “predomics”, an original machine learning approach inspired by microbial ecosystem interactions that is tailored for metagenomics data. It discovers accurate predictive signatures and provides unprecedented interpretability. The decision provided by the predictive model is based on a simple, yet powerful score computed by adding, subtracting, or dividing cumulative abundance of microbiome measurements.ResultsTested on >100 datasets, we demonstrate that predomics models are simple and highly interpretable. Even with such simplicity, they are at least as accurate as state-of-the-art methods. The family of best models, discovered during the learning process, offers the ability to distil biological information and to decipher the predictability signatures of the studied condition. In a proof-of-concept experiment, we successfully predicted body corpulence and metabolic improvement after bariatric surgery using pre-surgery microbiome data.ConclusionsPredomics is a new algorithm that helps in providing reliable and trustworthy diagnostic decisions in the microbiome field. Predomics is in accord with societal and legal requirements that plead for an explainable artificial intelligence approach in the medical field.

Highlights

  • BackgroundAn increasing wealth of data from high-throughput molecular and imaging technologies is connecting biomedical sciences and machine learning (ML)

  • Microbiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest

  • We introduce “predomics”, an original machine learning approach inspired by microbial ecosystem interactions that is tailored for metagenomics data

Read more

Summary

Background

An increasing wealth of data from high-throughput molecular and imaging technologies is connecting biomedical sciences and machine learning (ML). The potential competition between oral and gut microbes reported in previous studies [35] is best reflected by Ter and Ratio models with genus abundance data, which combine Veillonella (oral bacteria; opportunistic pathogen) enriched in liver cirrhosis at 1 side and Bacteroides plus Eubacterium (S9) or Coprococcus (S8) enriched in controls The latter represent butyrate producers (Coprococcus and Eubacterium) and complex polysaccharide degraders (Bacteroides genus) [36]. The best Ratio and Ter models (1–3) include oral bacterial species of the genus Veillonella (Veillonella unclassified), Streptococcus (S. parasanguinis and S. anginosus), and opportunistic pathogens like Megasphaera micronuciformis that proliferate in patients with liver cirrhosis, whereas butyrate producers of the genus Subdoligranilum (Subdoligranilum unclassified) closely related to Faecalibacterium prausnitzii [37] and complex polysaccharide-degrading species like Bacteroides cellulosilyticus [38] characterize control subjects. BTR models are more accurate than literature-based ones and have the ability to distil and capture the predictive biological information embedded in the data

Discussion
Findings
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.