Suggestion of statistical validation on feature importance of machine learning.

Youngro Lee,Jongmo Seo

doi:10.1109/embc40787.2023.10340208

Abstract

Feature importance methods are widely used in machine learning analysis for medical datasets as both primary and subsidiary tools. These methods aid in selecting biomarkers or markers indicating target diseases, and can provide valuable insight into the mechanism of a disease. However, the simple listing of features with their corresponding importance rank is not sufficient in determining the statistical significance of these features. In this paper, we propose a simple method for evaluating the statistical significance of feature importance values and selecting the optimal number of biomarkers. We demonstrate the application of this method using a public open dataset on heart failure.Clinical Relevance- In order for important indicators to be clinically useful, their statistical significance must be defined. By proposing a simple method for calculating statistical significance, this paper enables clinicians to select a group of biomarkers based on their feature importance in a machine learning model. This approach improves the accuracy and effectiveness of clinical decision-making, leading to more precise diagnosis, treatment, and management of various medical conditions.

Full Text