Abstract
Background and Objectives: With the progress of digitalization, electrocardiograms (ECGs) are increasingly measured by embedded and portable devices, which may lead to significant degradation of signal quality due to noise and artifacts. This leads to the necessity of signal quality assessment before ECG interpretation. Especially, if the ECG training database is not balanced between bad and good signals the classification accuracy and the data distribution on the two classes are affected. In this paper, a comparative study is elaborated between 10 re-sampling techniques for data balancing, which have been applied with the random forest classifier. Methods: Based on this study, we propose a novel metric to consider the representativeness of the original data, to guaranty the closeness of the quality assessment results to the initial classes’ partition. The evaluation of the classifier’s performance is based on two criteria, which are the classifier training performance and the representativeness coefficient. It is to note that this novel measure is a complementary metric for classification evaluation and the classifier performance has the first priority. We refer thereby to a multi-objective optimization (MOO) method as a tool to reconcile both criteria. Results: The hybrid balancing technique SMOTETomek fulfills both criteria in a good manner. It reaches a high training performance and maintains the data representativeness. Conclusion: It is not necessary for a classifier algorithm with the best classification performance to be able to maintain the representativeness of the data, which affirms the importance of this study. Thus the proposed new metric should be taken into consideration for the selection of the classification method when dealing with new resulted data after discarding the bad qualified signals.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have