Hybrid learning method based on feature clustering and scoring for enhanced COVID-19 breath analysis by an electronic nose

Shidiq Nur Hidayat,Trisna Julian,Agus Budi Dharmawan,Mayumi Puspita,Lily Chandra,Abdul Rohman,Madarina Julia,Aditya Rianjanu,Dian Kesumapramudya Nurputra,Kuwat Triyana,Hutomo Suryo Wasisto

doi:10.1016/j.artmed.2022.102323

Abstract

Breath pattern analysis based on an electronic nose (e-nose), which is a noninvasive, fast, and low-cost method, has been continuously used for detecting human diseases, including the coronavirus disease 2019 (COVID-19). Nevertheless, having big data with several available features is not always beneficial because only a few of them will be relevant and useful to distinguish different breath samples (i.e., positive and negative COVID-19 samples). In this study, we develop a hybrid machine learning-based algorithm combining hierarchical agglomerative clustering analysis and permutation feature importance method to improve the data analysis of a portable e-nose for COVID-19 detection (GeNose C19). Utilizing this learning approach, we can obtain an effective and optimum feature combination, enabling the reduction by half of the number of employed sensors without downgrading the classification model performance. Based on the cross-validation test results on the training data, the hybrid algorithm can result in accuracy, sensitivity, and specificity values of (86 ± 3)%, (88 ± 6)%, and (84 ± 6)%, respectively. Meanwhile, for the testing data, a value of 87% is obtained for all the three metrics. These results exhibit the feasibility of using this hybrid filter-wrapper feature-selection method to pave the way for optimizing the GeNose C19 performance.

Full Text