SMILE: systems metabolomics using interpretable learning and evolution

Ting Hu,Chengyuan Sha,Miroslava Cuperlovic-Culf

doi:10.1186/s12859-021-04209-1

Ting Hu, Chengyuan Sha + Show 1 more

Open Access

https://doi.org/10.1186/s12859-021-04209-1

Copy DOI

Journal: BMC Bioinformatics	Publication Date: May 28, 2021
Citations: 10	License type: open-access

Affiliation: Queen's University

Abstract

BackgroundDirect link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. Machine learning methods have seen increasing adoptions in metabolomics thanks to their powerful prediction abilities. However, the “black-box” nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is particularly predominant in biomedical research where understanding of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge.ResultsIn this article, we proposed a novel computational framework, Systems Metabolomics using Interpretable Learning and Evolution (SMILE), for supervised metabolomics data analysis. Our methodology uses an evolutionary algorithm to learn interpretable predictive models and to identify the most influential metabolites and their interactions in association with disease. Moreover, we have developed a web application with a graphical user interface that can be used for easy analysis, interpretation and visualization of the results. Performance of the method and utilization of the web interface is shown using metabolomics data for Alzheimer’s disease.ConclusionsSMILE was able to identify several influential metabolites on AD and to provide interpretable predictive models that can be further used for a better understanding of the metabolic background of AD. SMILE addresses the emerging issue of interpretability and explainability in machine learning, and contributes to more transparent and powerful applications of machine learning in bioinformatics.

Highlights

Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression
We propose a new interpretable machine learning framework for metabolic data analysis
The dataset includes patients with Alzheimer’s disease (AD), patients with amnestic mild cognitive impairment, and 57 healthy individuals as controls

Summary

Introduction

Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. The “black-box” nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is predominant in biomedical research where understand‐ ing of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge. More complex learning methods, such as ANN and ensemble learning, can provide high prediction accuracy but are almost impossible to interpret [2] These models remain mostly “black boxes” where the insights about the data and the working mechanisms of decision making are hidden in increasingly complex structures of the models. One needs numerous parameters to describe the model and it is impossible to entirely understand its mechanistic under-working [3, 4]

Methods

Results

Discussion

Conclusion