Using semi-supervised cluster method to correct the mislabeled training samples of ECG signals

Pengfei Wu,Senping Tian

doi:10.1109/ddcls49620.2020.9275143

Abstract

The classification accuracy of electrocardiogram(ECG) signals will decrease when the labels of some samples in the training set are incorrect. To mitigate this negative impact, the semi-supervised method is introduced to correct the mislabeled samples. The proposed method is based on the basic principle that the characteristics of samples of the same category are more similar than those of samples of different categories, so in the feature space,the number of samples of the same category around a sample is more than that of different categories. Cross validation is used to divide the training set into sub training set and validation set, and the samples in the validation set are regarded as unlabeled, k nearest neighbour(KNN) classifier label the samples in the validation set according to the samples in the sub training set. Because there are mislabeled samples in the sub training set, it is difficult for KNN classifier to label all samples in the validation set correctly at one time. So we need to use the above method iteratively. Thus, the mislabeled samples in the training set is basically corrected. Experients on the ECG signal corrected from the MIT-BIH arrhythmia database show the eectiveness of the proposed method.

Full Text