Interpretable genotype-to-phenotype classifiers with performance guarantees

Alexandre Drouin,François Laviolette,Mario Marchand,Frédéric Raymond,Jacques Corbeil,Gaël Letarte

doi:10.1038/s41598-019-40561-2

Alexandre Drouin, François Laviolette + Show 4 more

Open Access

https://doi.org/10.1038/s41598-019-40561-2

Copy DOI

Journal: Scientific Reports	Publication Date: Mar 11, 2019
Citations: 80	License type: open-access

Affiliation: Université Laval

Abstract

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potentially new ones. An open-source disk-based implementation that is both memory and computationally efficient is provided with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.

Highlights

The relationship between the genome of a cell and its phenotype is central to precision medicine
107 binary classification datasets were extracted, each consisting of discriminating isolates that are resistant or susceptible to an antimicrobial agent, based on their genome, in a given species
Predicting phenotypes from genotypes is a problem of high significance for biology that comes with great challenges for learning algorithms

Summary

Introduction

The relationship between the genome of a cell and its phenotype is central to precision medicine. Two algorithms that learn rule-based models are explored: (i) Classification and Regression Trees[8] (CART) and (ii) Set Covering Machines[9] (SCM). The former learns decision trees, which are hierarchical arrangements of rules and the latter learns conjunctions (logical-AND) and disjunctions (logical-OR), which are simple logical combinations of rules. Their accuracy and interpretability are demonstrated with an application to the prediction of antimicrobial resistance (AMR) in bacteria, a global public health concern of high significance.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interpretable genotype-to-phenotype classifiers with performance guarantees

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Revolutionizing Drug Discovery: Harnessing Machine Learning Algorithms
Tushar Khinvasara -
International Journal For Multidisciplinary Research | VOL. 6
Tushar Khinvasara -Tushar Khinvasara -
11 Apr 2024
International Journal For Multidisciplinary Research | VOL. 6

Prediction and Analysis of Digital Health Records, Geonomics, and Radiology Using Machine Learning
Sundeep Raj ... Nidhi Gupta
-
Sundeep Raj, et. al.Sundeep Raj ... Nidhi Gupta
10 Oct 2024
10 Oct 2024

Interpretable Learning and Pattern Mining: Scalable Algorithms and Data-Driven Applications

-

10 Jul 2020
10 Jul 2020

Performance evaluation of machine learning for fault selection in power transmission lines
Daniel Gutierrez-Rojas ... Ioannis T Christou
Knowledge and Information Systems | VOL. 64
Daniel Gutierrez-Rojas, et. al.Daniel Gutierrez-Rojas ... Ioannis T Christou
19 Feb 2022
Knowledge and Information Systems | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interpretable genotype-to-phenotype classifiers with performance guarantees

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports