Correction: Advancements in Hybrid Machine Learning Models for Biomedical Disease Classification Using Integration of Hyperparameter-Tuning and Feature Selection Methodologies: A Comprehensive Review
Correction: Advancements in Hybrid Machine Learning Models for Biomedical Disease Classification Using Integration of Hyperparameter-Tuning and Feature Selection Methodologies: A Comprehensive Review
- Research Article
46
- 10.1007/s13246-022-01106-6
- Jan 31, 2022
- Physical and engineering sciences in medicine
Knee Osteoarthritis (ΚΟΑ) is a degenerative joint disease of the knee that results from the progressive loss of cartilage. Due to KOA’s multifactorial nature and the poor understanding of its pathophysiology, there is a need for reliable tools that will reduce diagnostic errors made by clinicians. The existence of public databases has facilitated the advent of advanced analytics in KOA research however the heterogeneity of the available data along with the observed high feature dimensionality make this diagnosis task difficult. The objective of the present study is to provide a robust Feature Selection (FS) methodology that could: (i) handle the multidimensional nature of the available datasets and (ii) alleviate the defectiveness of existing feature selection techniques towards the identification of important risk factors which contribute to KOA diagnosis. For this aim, we used multidimensional data obtained from the Osteoarthritis Initiative database for individuals without or with KOA. The proposed fuzzy ensemble feature selection methodology aggregates the results of several FS algorithms (filter, wrapper and embedded ones) based on fuzzy logic. The effectiveness of the proposed methodology was evaluated using an extensive experimental setup that involved multiple competing FS algorithms and several well-known ML models. A 73.55% classification accuracy was achieved by the best performing model (Random Forest classifier) on a group of twenty-one selected risk factors. Explainability analysis was finally performed to quantify the impact of the selected features on the model’s output thus enhancing our understanding of the rationale behind the decision-making mechanism of the best model.
- Research Article
82
- 10.1016/j.aei.2016.05.005
- Jun 14, 2016
- Advanced Engineering Informatics
A data mining based load forecasting strategy for smart electrical grids
- Research Article
2
- 10.3233/jifs-213298
- Sep 22, 2022
- Journal of Intelligent & Fuzzy Systems
The use of recycled glass in the concrete mix instead of natural coarse aggregates and supplemental cementitious material has several advantages, including the conservation of natural resources, the reduction of CO2 emissions, and cost savings. However, due to their qualities, the mechanical properties of concrete containing Ground Glass Particles (GGP) differ from those of natural aggregates concrete. As a result, assessing the compressive strength (CS) of concrete with GGP is crucial. Therefore, this paper proposes the hybrid Machine Learning (ML) model including the Gradient Boosting (GB) and Bayesian optimization (BO) algorithms for predicting the compressive strength of concrete containing GGP. The hybrid ML model is developed and validated based on the training dataset (70% of the data) and the test dataset (30% of the remaining data), respectively. The performance of hybrid ML model is evaluated by three criteria, such as the Pearson correlation coefficient (R), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The K-Fold Cross-Validation technique is also used to verify the reliability of the hybrid ML model). The best performance of the hybrid ML model is determined with the R = 0.9843, RMSE = 1.7256 (MPa), and MAE = 1.3154 (MPa) for training dataset and R = 0.9784, RMSE = 2.4338 (MPa) and MAE = 1.9618 (MPa) for testing dataset. Based on the best hybrid ML model, the sensitivity analysis including SHapley Additive exPlanation (SHAP) and Partial Dependence Plots (PDP) 2D are investigated to obtain an in-depth examination of each individual input variable on the predicted compressive strength of concrete contaning GGP. The sensitivity analysis shows that four factors, such as curing age, surface area, TiO2, and temperature have the most effect on the compressive strength of concrete containing GGP.
- Research Article
- 10.56536/jicet.v5i1.188
- Apr 18, 2025
- Journal of Innovative Computing and Emerging Technologies
Abstract: Angiography serves as a vital imaging technique for diagnosing and treating cardiovascular disease by enabling precise visualization of coronary arteries. This study investigates the impact of multiple feature selection strategies on enhancing the predictive accuracy of machine learning (ML) and deep learning (DL) models in classifying angiography outcome. The dataset consists of 976 angiography videos, categorized into three classes: Normal, Coronary Artery Bypass Grafting (CABG), and Angioplasty. [1] To ensure data reliability, cases related to percutaneous Trans venous Mitral Commissurotomy (PTMC) and insufficient patient records were excluded. A Comparative analysis was conducted using traditional ML models such as Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting Machines (GBM) alongside deep learning architecture like Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. [2] Various feature selection techniques, including Principle Component Analysis (PCA), Recursive Feature Elimination (RFE), and Mutual Information (MI), were employed to enhance model interpretability and efficiency. [3] Results demonstrate that while ML models achieved moderate accuracy ranging between 85% and 90% and DL models significantly outperformed them, reaching an accuracy of 91%. The superior performance of DL approaches can be attributed to their ability to automatically extract hierarchical spatial and temporal features from angiography videos, whereas ML models rely heavily on handcrafted features, limiting their effectiveness [4]. These findings emphasize the critical role of feature selection and deep learning methodologies in improving angiography outcome predictions, ultimately contributing to Al-driven advancements in cardiovascular diagnostics. The study underscores the potential of DL-based automated angiography assessment in supporting clinical decision-making, leading to improved patient management and early intervention strategies. [5]
- Book Chapter
- 10.71443/9788197933608-02
- Feb 17, 2025
The rapid advancements in computing technologies, especially quantum computing, pose significant challenges to traditional encryption methods, compelling the need for more robust, adaptive, and scalable solutions. Hybrid machine learning (ML) models have emerged as a promising approach to address these challenges, offering enhanced security, performance, and scalability. This book chapter explores the intersection of hybrid ML models and encryption methodologies, focusing on how these models can transform data encryption techniques for secure communication. By integrating various ML techniques such as supervised, unsupervised, and reinforcement learning, hybrid models provide adaptive encryption strategies that can dynamically respond to emerging threats and evolving system requirements. The chapter delves into the application of hybrid ML models in quantum-safe encryption, key management systems, and real-time adaptive encryption, showcasing case studies that demonstrate their practical impact in securing data in both traditional and quantum computing environments. Through comprehensive analysis, this chapter highlights the potential of hybrid ML models to optimize encryption efficiency, enhance key exchange protocols, and ensure the scalability of encryption systems, paving the way for a secure and future-proof communication infrastructure.
- Research Article
18
- 10.3390/healthcare9030260
- Mar 1, 2021
- Healthcare
Knee osteoarthritis (KOA) is a multifactorial disease which is responsible for more than 80% of the osteoarthritis disease’s total burden. KOA is heterogeneous in terms of rates of progression with several different phenotypes and a large number of risk factors, which often interact with each other. A number of modifiable and non-modifiable systemic and mechanical parameters along with comorbidities as well as pain-related factors contribute to the development of KOA. Although models exist to predict the onset of the disease or discriminate between asymptotic and OA patients, there are just a few studies in the recent literature that focused on the identification of risk factors associated with KOA progression. This paper contributes to the identification of risk factors for KOA progression via a robust feature selection (FS) methodology that overcomes two crucial challenges: (i) the observed high dimensionality and heterogeneity of the available data that are obtained from the Osteoarthritis Initiative (OAI) database and (ii) a severe class imbalance problem posed by the fact that the KOA progressors class is significantly smaller than the non-progressors’ class. The proposed feature selection methodology relies on a combination of evolutionary algorithms and machine learning (ML) models, leading to the selection of a relatively small feature subset of 35 risk factors that generalizes well on the whole dataset (mean accuracy of 71.25%). We investigated the effectiveness of the proposed approach in a comparative analysis with well-known FS techniques with respect to metrics related to both prediction accuracy and generalization capability. The impact of the selected risk factors on the prediction output was further investigated using SHapley Additive exPlanations (SHAP). The proposed FS methodology may contribute to the development of new, efficient risk stratification strategies and identification of risk phenotypes of each KOA patient to enable appropriate interventions.
- Research Article
52
- 10.1109/jiot.2018.2806990
- Dec 1, 2018
- IEEE Internet of Things Journal
In the last years multiple Internet of Things (IoT) solutions have been developed to detect, track, count, and identify human activity from people that do not carry any device nor participate actively in the detection process. When WiFi radio receivers are employed as sensors for device-free human activity recognition, channel quality measurements are preprocessed in order to extract predictive features toward performing the desired activity recognition via machine learning (ML) models. Despite the variety of predictors in the literature, there is no universally outperforming set of features for all scenarios and applications. However, certain feature combinations could achieve a better average detection performance compared to the use of a thorough feature portfolio. Such predictors are often obtained by feature engineering and selection techniques applied before the learning process. This manuscript elaborates on the feature engineering and selection methodology for counting device-free people by solely resorting to the fluctuation and variation of WiFi signals exchanged by IoT devices. We comprehensively review the feature engineering and ML models employed in the literature from a critical perspective, identifying trends, research niches, and open challenges. Furthermore, we present and provide the community with a new open database with WiFi measurements in several indoor environments (i.e., rooms, corridors, and stairs) where up to five people can be detected. This dataset is used to exhaustively assess the performance of different ML models with and without feature selection, from which insightful conclusions are drawn regarding the predictive potential of different predictors across scenarios of diverse characteristics.
- Research Article
18
- 10.1016/j.jechem.2024.04.022
- Apr 25, 2024
- Journal of Energy Chemistry
This study investigates the dry reformation of methane (DRM) over Ni/Al2O3 catalysts in a dielectric barrier discharge (DBD) non-thermal plasma reactor. A novel hybrid machine learning (ML) model is developed to optimize the plasma-catalytic DRM reaction with limited experimental data. To address the non-linear and complex nature of the plasma-catalytic DRM process, the hybrid ML model integrates three well-established algorithms: regression trees, support vector regression, and artificial neural networks. A genetic algorithm (GA) is then used to optimize the hyperparameters of each algorithm within the hybrid ML model. The ML model achieved excellent agreement with the experimental data, demonstrating its efficacy in accurately predicting and optimizing the DRM process. The model was subsequently used to investigate the impact of various operating parameters on the plasma-catalytic DRM performance. We found that the optimal discharge power (20 W), CO2/CH4 molar ratio (1.5), and Ni loading (7.8 wt%) resulted in the maximum energy yield at a total flow rate of ∼51 mL/min. Furthermore, we investigated the relative significance of each operating parameter on the performance of the plasma-catalytic DRM process. The results show that the total flow rate had the greatest influence on the conversion, with a significance exceeding 35% for each output, while the Ni loading had the least impact on the overall reaction performance. This hybrid model demonstrates a remarkable ability to extract valuable insights from limited datasets, enabling the development and optimization of more efficient and selective plasma-catalytic chemical processes.
- Book Chapter
- 10.1016/b978-0-443-16147-6.00036-0
- Jan 1, 2024
- Decision-Making Models
Chapter 27 - Comparison of machine learning models for lung cancer prediction using different feature selection methodologies
- Book Chapter
5
- 10.1007/978-3-030-22871-2_39
- Jan 1, 2019
Text Classification is a renowned machine learning approach to simplify the domain-specific investigation. Consequently, it is frequently utilized in the field of sentimental analysis. The demanding business requirements urge to devise new techniques and approaches to improve the performance of sentimental analysis. In this context, ensemble of classifiers is one of the promising approach to improve classification accuracy. However, classifier ensemble is usually done for classification while ignoring the significance of feature selection. In the presence of right feature selection methodology, the classification accuracy can be significantly improved even when the classification is performed through a single classifier. This article presents a novel feature selection ensemble approach for sentimental classification. Firstly, the combination of three well-known features (i.e. lexicon, phrases and unigram) is introduced. Secondly, two level ensemble is proposed for feature selection by exploiting Gini Index (GI), Information Gain (IG), Support Vector Machine (SVM) and Logistic Regression (LR). Subsequently, the classification is performed through SVM classifier. The implementation of proposed approach is carried out in GATE and RapidMiner tools. Furthermore, two benchmark datasets, frequently utilized in the domain of sentimental classification, are used for experimental evaluation. The experimental results prove that our proposed ensemble approach significantly improve the performance of sentimental classification with respect to well-known state-of-the-art approaches. Furthermore, it is also analyzed that the ensemble of classifiers for the improvement of classification accuracy is not necessarily important in the presence of right feature selection methodology.
- Research Article
41
- 10.3390/w15091750
- May 2, 2023
- Water
Developing precise soft computing methods for groundwater management, which includes quality and quantity, is crucial for improving water resources planning and management. In the past 20 years, significant progress has been made in groundwater management using hybrid machine learning (ML) models as artificial intelligence (AI). Although various review articles have reported advances in this field, existing literature must cover groundwater management using hybrid ML. This review article aims to understand the current state-of-the-art hybrid ML models used for groundwater management and the achievements made in this domain. It includes the most cited hybrid ML models employed for groundwater management from 2009 to 2022. It summarises the reviewed papers, highlighting their strengths and weaknesses, the performance criteria employed, and the most highly cited models identified. It is worth noting that the accuracy was significantly enhanced, resulting in a substantial improvement and demonstrating a robust outcome. Additionally, this article outlines recommendations for future research directions to enhance the accuracy of groundwater management, including prediction models and enhance related knowledge.
- Conference Article
10
- 10.1109/bsn.2017.7936039
- May 1, 2017
Inertial measurement unit (IMU) based systems are becoming increasingly popular in the classification of human movement. While research in the area has established the utility of various machine learning classification methods, there is a paucity of evidence investigating the effect of feature selection on classification efficacy. The aim of this study was therefore to investigate the influence of feature selection methodology on the classification accuracy of human movement data. The efficacy of four commonly used feature selection and classification methods were compared using four IMU human movement data sets. Optimisation of classification and features selection methodologies resulted in an overall improvement in F1 score of between 1–8% for all four data sets. The findings from this study illustrate the need for researchers to consider the effect classification and feature selection methodologies may have on system efficacy.
- Research Article
68
- 10.1016/j.apacoust.2023.109492
- Jun 28, 2023
- Applied Acoustics
Emotional speech Recognition using CNN and Deep learning techniques
- Book Chapter
- 10.71443/9788197933646-13
- Dec 7, 2024
Feature selection was a critical process in machine learning that enhances model performance by identifying the most relevant features from high-dimensional datasets. This book chapter delves into various feature selection methodologies, emphasizing the significance of hybrid approaches that leverage the strengths of different techniques, including filter, wrapper, and embedded methods. It critically examines the advantages and limitations of each method, providing insights into their applicability across diverse domains. The chapter also explores future directions in hybrid feature selection, including the integration of advanced algorithms, adaptation to varying data types, and the incorporation of domain knowledge. Emphasizing computational efficiency and real-time application potential, this work serves as a comprehensive guide for researchers and practitioners aiming to enhance machine learning models through effective feature selection. The findings and discussions presented herein contribute to the ongoing discourse in the field and provide a roadmap for future research initiatives.
- Research Article
6
- 10.1007/s11265-006-0026-5
- Mar 27, 2007
- The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
In pattern recognition, a suitable criterion for feature selection is the mutual information (MI) between feature vectors and class labels. Estimating MI in high dimensional feature spaces is problematic in terms of computation load and accuracy. We propose an independent component analysis based MI estimation (ICA-MI) methodology for feature selection. This simplifies the high dimensional MI estimation problem into multiple one-dimensional MI estimation problems. Nonlinear ICA transformation is achieved using piecewise local linear approximation on partitions in the feature space, which allows the exploitation of the additivity property of entropy and the simplicity of linear ICA algorithms. Number of partitions controls the tradeoff between more accurate approximation of the nonlinear data topology and small-sample statistical variations in estimation. We test the ICA-MI feature selection framework on synthetic, UCI repository, and EEG activity classification problems. Experiments demonstrate, as expected, that the selection of the number of partitions for local linear ICA is highly problem dependent and must be carried out properly through cross validation. When this is done properly, the proposed ICA-MI feature selection framework yields feature ranking results that are comparable to the optimal probability of error based feature ranking and selection strategy at a much lower computational load.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.