Abstract

Machine learning algorithms that are both interpretable and accurate are essential in applications such as medicine where errors can have a dire consequence. Unfortunately, there is currently a tradeoff between accuracy and interpretability among state-of-the-art methods. Decision trees are interpretable and are therefore used extensively throughout medicine for stratifying patients. Current decision tree algorithms, however, are consistently outperformed in accuracy by other, less-interpretable machine learning models, such as ensemble methods. We present MediBoost, a novel framework for constructing decision trees that retain interpretability while having accuracy similar to ensemble methods, and compare MediBoost’s performance to that of conventional decision trees and ensemble methods on 13 medical classification problems. MediBoost significantly outperformed current decision tree algorithms in 11 out of 13 problems, giving accuracy comparable to ensemble methods. The resulting trees are of the same type as decision trees used throughout clinical practice but have the advantage of improved accuracy. Our algorithm thus gives the best of both worlds: it grows a single, highly interpretable tree that has the high accuracy of ensemble methods.

Highlights

  • Patient stratification involves the integration of complex data structures that include gene-expression patterns, individual proteins, proteomics patterns, metabolomics, histology or imaging[2], all of which machine learning algorithms can correctly analyze

  • We present a framework for constructing decision trees that have equivalent accuracy to ensemble methods while maintaining high interpretability

  • If the area under the curve (AUC) are compared using the Wilcoxon sign rank test with the Bonferroni adjustment for multiple comparison, MediBoost is significantly better than ID3 (p = 8.69 × 10−10 ) and CART (p = 8.89 × 10−9) but not significantly different from LogitBoost (p = 0.85)

Read more

Summary

Introduction

Patient stratification involves the integration of complex data structures that include gene-expression patterns, individual proteins, proteomics patterns, metabolomics, histology or imaging[2], all of which machine learning algorithms can correctly analyze. Other sources of information such as those from electronic medical records, scientific literature, and physician experience and intuition, are more difficult to integrate For this reason, interpretability is a core requirement for machine learned models used in medicine. A classifier is considered to be interpretable if one can explain its classification by a conjunction of conditional statements, i.e., if- rules, about the collected data, in our case, data used for patient stratification Under this definition, standard decision trees, such as those learned by ID3 or CART, are considered interpretable but ensemble methods are not. We present a framework for constructing decision trees that have equivalent accuracy to ensemble methods while maintaining high interpretability This unique combination of model accuracy and interpretability addresses a long-standing challenge in machine learning that is essential for medical applications. The applications of our algorithm are not limited to the medical field; it could be used in any other application that employs decision trees

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.