The paper addresses a common and recurring problem of electrocardiogram (ECG) classification based on heart rate variability (HRV) analysis. Current understanding of the limits of HRV analysis in diagnosing different cardiac conditions is not complete. Existing research suggests that a combination of carefully selected linear and nonlinear HRV features should significantly improve the accuracy for both binary and multiclass classification problems. The primary goal of this work is to evaluate a proposed combination of HRV features. Other explored objectives are the comparison of different machine learning algorithms in the HRV analysis and the inspection of the most suitable period T between two consecutively analyzed R-R intervals for nonlinear features. We extracted 11 features from 5min of R-R interval recordings: SDNN, RMSSD, pNN20, HRV triangular index (HTI), spatial filling index (SFI), correlation dimension, central tendency measure (CTM), and four approximate entropy features (ApEn1-ApEn4). Analyzed heart conditions included normal heart rhythm, arrhythmia (any), supraventricular arrhythmia, and congestive heart failure. One hundred patient records from six online databases were analyzed, 25 for each condition. Feature vectors were extracted by a platform designed for this purpose, named ECG Chaos Extractor. The vectors were then analyzed by seven clustering and classification algorithms in the Weka system: K-means, expectation maximization (EM), C4.5 decision tree, Bayesian network, artificial neural network (ANN), support vector machines (SVM) and random forest (RF). Four-class and two-class (normal vs. abnormal) classification was performed. Relevance of particular features was evaluated using 1-Rule and C4.5 decision tree in the cases of individual features classification and classification with features' pairs. Average total classification accuracy obtained for top three classification methods in the two classes' case was: RF 99.7%, ANN 99.1%, SVM 98.9%. In the four classes' case the best results were: RF 99.6%, Bayesian network 99.4%, SVM 98.4%. The best overall method was RF. C4.5 decision tree was successful in the construction of useful classification rules for the two classes' case. EM and K-means showed comparable clustering results: around 50% for the four classes' case and around 75% for the two classes' case. HTI, pNN20, RMSSD, ApEn3, ApEn4 and SFI were shown to be the most relevant features. HTI in particular appears in most of the top-ranked pairs of features and is the best analyzed feature. The choice of the period T for nonlinear features was shown to be arbitrary. However, a combination of five different periods significantly improved classification accuracy, from 70% for a single period up to 99% for five periods. Analysis shows that the proposed combination of 11 linear and nonlinear HRV features gives high classification accuracy when nonlinear features are extracted for five periods. The features' combination was thoroughly analyzed using several machine learning algorithms. In particular, RF algorithm proved to be highly efficient and accurate in both binary and multiclass classification of HRV records. Interpretable and useful rules were obtained with C4.5 decision tree. Further work in this area should elucidate which features should be extracted for the best classification results for specific types of cardiac disorders.
Read full abstract