Correlation-based Feature Selection Research Articles

Identification of convulsive epilepsy in sub-Saharan Africa relies on access to resources that are often unavailable. Infrastructure and resource requirements can further complicate case verification. Using machine-learning techniques, we have developed and tested a region-specific questionnaire panel and predictive model to identify people who have had a convulsive seizure. These findings have been implemented into a free app for health-care workers in Kenya, Uganda, Ghana, Tanzania, and South Africa. In this retrospective case-control study, we used data from the Studies of the Epidemiology of Epilepsy in Demographic Sites in Kenya, Uganda, Ghana, Tanzania, and South Africa. We randomly split these individuals using a 7:3 ratio into a training dataset and a validation dataset. We used information gain and correlation-based feature selection to identify eight binary features to predict convulsive seizures. We then assessed several machine-learning algorithms to create a multivariate prediction model. We validated the best-performing model with the internal dataset and a prospectively collected external-validation dataset. We additionally evaluated a leave-one-site-out model (LOSO), in which the model was trained on data from all sites except one that, in turn, formed the validation dataset. We used these features to develop a questionnaire-based predictive panel that we implemented into a multilingual app (the Epilepsy Diagnostic Companion) for health-care workers in each geographical region. We analysed epilepsy-specific data from 4097 people, of whom 1985 (48·5%) had convulsive epilepsy, and 2112 were controls. From 170 clinical variables, we initially identified 20 candidate predictor features. Eight features were removed, six because of negligible information gain and two following review by a panel of qualified neurologists. Correlation-based feature selection identified eight variables that demonstrated predictive value; all were associated with an increased risk of an epileptic convulsion except one. The logistic regression, support vector, and naive Bayes models performed similarly, outperforming the decision-tree model. We chose the logistic regression model for its interpretability and implementability. The area under the receiver operator curve (AUC) was 0·92 (95% CI 0·91-0·94, sensitivity 85·0%, specificity 93·7%) in the internal-validation dataset and 0·95 (0·92-0·98, sensitivity 97·5%, specificity 82·4%) in the external-validation dataset. Similar results were observed for the LOSO model (AUC 0·94, 0·93-0·96, sensitivity 88·2%, specificity 95·3%). On the basis of these findings, we developed the Epilepsy Diagnostic Companion as a predictive model and app offering a validated culture-specific and region-specific solution to confirm the diagnosis of a convulsive epileptic seizure in people with suspected epilepsy. The questionnaire panel is simple and accessible for health-care workers without specialist knowledge to administer. This tool can be iteratively updated and could lead to earlier, more accurate diagnosis of seizures and improve care for people with epilepsy. The Wellcome Trust, the UK National Institute of Health Research, and the Oxford NIHR Biomedical Research Centre.

Read full abstract

Context:The application of Software Fault Prediction (SFP) in the software development life cycle to predict the faulty class at the early stage has piqued the interest of various scholars. In the SFP domain, during research analysis, it got realized that there has been very little work instigated on addressing both class imbalance and feature redundancy problems jointly to enhance the performance and prediction accuracy of SFP models. It has been perceived in the literature survey the study of droughts with the comprehensive comparative analysis of different sampling and feature selection strategies together. Objective:This research builds an extensive assessment of distinct combinations of different feature selection and sampling approaches, to effectively overcome the problems of class overlap, class imbalance, and feature redundancy. The objective is to determine the best combination that will produce results with a higher degree of accuracy and an effective SFP model. Method:Considering the above erudition, the study has applied 8 different sampling techniques along with 10 feature selection algorithms against 56 open-source projects. The comparative analysis is performed against 5346 variants of input datasets by applying 8 different classifiers to predict the faulty class. In addition, the research paper presents an intensive assessment and performance of these techniques individually against all the input projects. We have considered accuracy and Area Under the ROC (receiver operating characteristic curve) Curve (AUC) performance metrics to compare the performance of different models developed using the classification algorithm. Result:For each project in the proposed work, we evaluated a total of 792 combinations that were produced using 10 feature selection methods, 1 all metrics dataset, 8 sampling methods, 1 original, unsampled dataset, and 8 classifiers. The empirical result indicates that, against 21 projects out of 54 projects, Synthetic Minority Over Sampling Technique Edited (SMOTEE) with correlation-based feature selection (FS2) combination outperformed with the highest AUC value which is 38.89 % of projects. Additionally, according to experimental results, the highest AUC values were attained by 24.07 % of projects using the SMOTEE, FS2, and RF combination. Conclusion:The results of the statical analysis test reveal that 93.42 % of the combinational pairs of different sampling and feature selection approaches demonstrated a significant variance in the performance of the distinct combinations of sampling and feature selection techniques. The empirical result indicates the performance of the SFP Model is adversely impacted by class imbalance and irrelevance. The outcome indicates for more than 75% of projects, the performance of trained models improved with an AUC value between a range of 0.805 to 0.99 post-application of sampling and feature selection strategies, in comparison without the use of feature selection and sampling techniques.

Read full abstract

Correlation-based Feature Selection Research Articles

Related Topics

Articles published on Correlation-based Feature Selection

GAAMmf: genetic algorithm with aggressive mutation and decreasing feature set for feature selection

Estimation of Winter Wheat SPAD Values Based on UAV Multispectral Remote Sensing

Öznitelik Seçim Yöntemlerinin Toplam Ekipman Etkinliği Tahmin Başarısı Üzerindeki Etkisinin Araştırılması

An exploratory in N-doped carbon dots as green fluorescence probes for Hg(II) ions detection

Experimental and data driven measurement of engine dynamometer bearing lifespan using acoustic emission

Toward the automatic detection of effective gerbil holes in desert grasslands through unmanned aerial vehicle imagery

Development of an online Nigella sativa inspection system equipped with machine vision technology and artificial neural networks

Gene Expression Classification for Biomarker Identification in Maize Subjected to Various Biotic Stresses.

Investigating Feature Selection Techniques to Enhance the Performance of EEG-Based Motor Imagery Tasks Classification

Development and validation of a diagnostic aid for convulsive epilepsy in sub-Saharan Africa: a retrospective case-control study.

Transcriptomic data in tumor-adjacent normal tissues harbor prognostic information on multiple cancer types.

Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction

A Multi-level Random Forest Model-Based Intrusion Detection Using Fuzzy Inference System for Internet of Things Networks

Network intrusion detection using data dimensions reduction techniques

Improved Stress Classification Using Automatic Feature Selection from Heart Rate and Respiratory Rate Time Signals

Photoplethysmography Signal Wavelet Enhancement and Novel Features Selection for Non-Invasive Cuff-Less Blood Pressure Monitoring.

Quantitative Prediction of Inorganic Nanomaterial Cellular Toxicity via Machine Learning.

SMMO-CoFS: Synthetic Multi-minority Oversampling with Collaborative Feature Selection for Network Intrusion Detection System

Tree species classification in a typical natural secondary forest using UAV-borne LiDAR and hyperspectral data

Comparison of Logistic Regression and Random Forest using Correlation-based Feature Selection for Phishing Website Detection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Correlation-based Feature Selection Research Articles

Related Topics

Articles published on Correlation-based Feature Selection

GAAMmf: genetic algorithm with aggressive mutation and decreasing feature set for feature selection

Estimation of Winter Wheat SPAD Values Based on UAV Multispectral Remote Sensing

Öznitelik Seçim Yöntemlerinin Toplam Ekipman Etkinliği Tahmin Başarısı Üzerindeki Etkisinin Araştırılması

An exploratory in N-doped carbon dots as green fluorescence probes for Hg(II) ions detection

Experimental and data driven measurement of engine dynamometer bearing lifespan using acoustic emission

Toward the automatic detection of effective gerbil holes in desert grasslands through unmanned aerial vehicle imagery

Development of an online Nigella sativa inspection system equipped with machine vision technology and artificial neural networks

Gene Expression Classification for Biomarker Identification in Maize Subjected to Various Biotic Stresses.

Investigating Feature Selection Techniques to Enhance the Performance of EEG-Based Motor Imagery Tasks Classification

Development and validation of a diagnostic aid for convulsive epilepsy in sub-Saharan Africa: a retrospective case-control study.

Transcriptomic data in tumor-adjacent normal tissues harbor prognostic information on multiple cancer types.

Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction

A Multi-level Random Forest Model-Based Intrusion Detection Using Fuzzy Inference System for Internet of Things Networks

Network intrusion detection using data dimensions reduction techniques

Improved Stress Classification Using Automatic Feature Selection from Heart Rate and Respiratory Rate Time Signals

Photoplethysmography Signal Wavelet Enhancement and Novel Features Selection for Non-Invasive Cuff-Less Blood Pressure Monitoring.

Quantitative Prediction of Inorganic Nanomaterial Cellular Toxicity via Machine Learning.

SMMO-CoFS: Synthetic Multi-minority Oversampling with Collaborative Feature Selection for Network Intrusion Detection System

Tree species classification in a typical natural secondary forest using UAV-borne LiDAR and hyperspectral data

Comparison of Logistic Regression and Random Forest using Correlation-based Feature Selection for Phishing Website Detection