Exploring machine learning for untargeted metabolomics using molecular fingerprints

Christel Sirocchi,Federica Biancucci,Matteo Donati,Alessandro Bogliolo,Mauro Magnani,Michele Menotta,Sara Montagna

doi:10.1016/j.cmpb.2024.108163

Christel Sirocchi, Federica Biancucci + Show 5 more

Open Access

https://doi.org/10.1016/j.cmpb.2024.108163

Copy DOI

Abstract

BackgroundMetabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways. MethodsThis study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups. ResultsThe approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study. ConclusionIn conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models.

Full Text