Abstract

BackgroundIn high-dimensional data analysis, the complexity of predictive models can be reduced by selecting the most relevant features, which is crucial to reduce data noise and increase model accuracy and interpretability. Thus, in the field of clinical decision making, only the most relevant features from a set of medical descriptors should be considered when determining whether a patient is healthy or not. This statistical approach known as feature selection can be performed through regression or classification, in a supervised or unsupervised manner. Several feature selection approaches using different mathematical concepts have been described in the literature. In the field of classification, a new approach has recently been proposed that uses the gamma-metric, an index measuring separability between different classes in heart rhythm characterization. The present study proposes a filter approach for feature selection in classification using this gamma-metric, and evaluates its application to automatic atrial fibrillation detection.MethodsThe stability and prediction performance of the gamma-metric feature selection approach was evaluated using the support vector machine model on two heart rhythm datasets, one extracted from the PhysioNet database and the other from the database of Marseille University Hospital Center, France (Timone Hospital). Both datasets contained electrocardiogram recordings grouped into two classes: normal sinus rhythm and atrial fibrillation. The performance of this feature selection approach was compared to that of three other approaches, with the first two based on the Random Forest technique and the other on receiver operating characteristic curve analysis.ResultsThe gamma-metric approach showed satisfactory results, especially for models with a smaller number of features. For the training dataset, all prediction indicators were higher for our approach (accuracy greater than 99% for models with 5 to 17 features), as was stability (greater than 0.925 regardless of the number of features included in the model). For the validation dataset, the features selected with the gamma-metric approach differed from those selected with the other approaches; sensitivity was higher for our approach, but other indicators were similar.ConclusionThis filter approach for feature selection in classification opens up new methodological avenues for atrial fibrillation detection using short electrocardiogram recordings.

Highlights

  • In high-dimensional data analysis, the complexity of predictive models can be reduced by selecting the most relevant features, which is crucial to reduce data noise and increase model accuracy and interpretability

  • Michel et al BMC Med Inform Decis Mak 2021, 21(Suppl 4):130. This filter approach for feature selection in classification opens up new methodological avenues for atrial fibrillation detection using short electrocardiogram recordings

  • The present paper proposes a filter approach for feature selection in classification that uses the γ-metric introduced by Pons et al as an evaluation function as well as the support vector machine (SVM) model to [15] solve the supervised classification problem

Read more

Summary

Introduction

In high-dimensional data analysis, the complexity of predictive models can be reduced by selecting the most relevant features, which is crucial to reduce data noise and increase model accuracy and interpretability. In statistics and high-dimensional data analysis, the scoring and ranking of individual features may be necessary for feature selection and dimension reduction [1] This approach reduces both the complexity of the model and the noise present in the data, which increases model accuracy and interpretability [2]. Feature selection is a data preprocessing technique that consists in generating the best possible feature subset through selecting the most relevant features and removing redundant or noisy ones This technique speeds up classification (training and testing) and optimizes model accuracy (e.g., prediction error rate). Among these is the “best-first search” approach [4], which is similar to the greedy approach, but differs from it in that it chooses the best neighbor subset among all evaluated ones. A recent paper comparing the performance of different state-of-the-art metaheuristic algorithms (including GA), found that these approaches may constitute good alternatives for the problem of parameter estimation in real world applications [7]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call