Detection of congestive heart failure from short-term heart rate variability segments using hybrid feature selection approach

Alan Jovic,Karla Brkic,Goran Krstacic

doi:10.1016/j.bspc.2019.101583

Abstract

ObjectivesThe aim of this work is to investigate the accuracy limits of automated detection of congestive heart failure (CHF) from short-term heart rate variability (HRV) series. Short-term HRV analysis uses 5-minute segments from HRV recordings to diagnose a disorder. This work proposes a hybrid feature selection procedure aimed at finding highly accurate models containing only a few highly informative features, which enables physiological interpretation of the features relevant for the model. Materials and methodsShort-term HRV segments are analyzed for CHF diagnosis. Subjects' records from four public PhysioNet databases are considered (66 healthy subjects and 42 CHF subjects). The problem is approached from a machine learning perspective, by extracting 111 linear time domain, frequency domain, time-frequency, nonlinear and symbolic dynamics HRV features. A multistage hybrid feature selection method is proposed that eventually eliminates most features. The method uses a symmetrical uncertainty filter, Naive Bayes wrapper with best first search, and final greedy iterative feature elimination. For classification purposes, we use rotation forest (RTF), radial based support vector machines (SVM), random forest (RF), multilayer perceptron artificial neural network, and k-nearest neighbors’ classifiers in order to evaluate the feature sets at each step of the process and to obtain as accurate model as possible. Leave-one-subject-out cross-validation evaluation method was used, with two variants: subject-level (coarse-grained) and feature vector-level (fine-grained). ResultsThe results show that the feature selection method is capable of either improving or retaining the classification accuracy of the full feature set (RTF: subject-level ACC = 88.9%, feature vector-level ACC = 85.6%; SVM: subject-level ACC = 89.8%, feature vector-level ACC = 83.5%; RF: subject-level ACC = 87.0%, feature vector-level ACC = 85.5%), while greatly reducing the number of included features, to only four HRV features for RTF and RF, and only two HRV features for SVM. The resulting best models for subject-level classification achieved are: RTF: ACC = 90.7%, SENS = 78.6%, SPEC = 98.6%, obtained with features: LF/HF ratio, maximum alphabet entropy, alphabet entropy variance, and HaarWaveletSD (scale = 8); SVM: ACC = 88.0%, SENS = 78.6%, SPEC = 93.9%, obtained with features: LF/HF ratio and Rate_U; RF: ACC = 90.7%, SENS = 78.6%, SPEC = 98.6%, obtained with features: LF/HF ratio, maximum alphabet entropy, Rate_U, and Rate_B. Other classifiers provided similar, but somewhat lower results. A comparison of the procedure with the results of individual filter, wrapper, and simple hybrid approaches is provided, which demonstrates the efficiency of the proposed procedure. ConclusionsThe results suggest that the method can achieve accurate generalizable models for automated diagnosis of CHF from short-term HRV segments in subjects with very few informative features. The choice of the best features and the classification results are similar between the three best classifiers, so the use of any of them with the proposed method is recommended. Nonlinear and symbolic dynamics features are shown to have an important role in the resulting models. The presented methodology may be useful for first-hand screening for CHF as well as for similar diagnostic or automated detection problems in biomedicine.

Full Text