A hybrid machine learning approach of fuzzy-rough-k-nearest neighbor, latent semantic analysis, and ranker search for efficient disease diagnosis
Machine learning approaches have a valuable contribution in improving competency in automated decision systems. Several machine learning approaches have been developed in the past studies in individual disease diagnosis prediction. The present study aims to develop a hybrid machine learning approach for diagnosis predictions of multiple diseases based on the combination of efficient feature generation, selection, and classification methods. Specifically, the combination of latent semantic analysis, ranker search, and fuzzy-rough-k-nearest neighbor has been proposed and validated in the diagnosis prediction of the primary tumor, post-operative, breast cancer, lymphography, audiology, fertility, immunotherapy, and COVID-19, etc. The performance of the proposed approach is compared with single and other hybrid machine learning approaches in terms of accuracy, analysis time, precision, recall, F-measure, the area under ROC, and the Kappa coefficient. The proposed hybrid approach performs better than single and other hybrid approaches in the diagnosis prediction of each of the selected diseases. Precisely, the suggested approach achieved the maximum recognition accuracy of 99.12%of the primary tumor, 96.45%of breast cancer Wisconsin, 94.44%of cryotherapy, 93.81%of audiology, and significant improvement in the classification accuracy and other evaluation metrics in the recognition of the rest of the selected diseases. Besides, it handles the missing values in the dataset effectively.
- Research Article
- 10.11591/eei.v13i5.8004
- Oct 1, 2024
- Bulletin of Electrical Engineering and Informatics
Nowadays, smartphones seamlessly blend into every aspect of our lives, including as handheld assistants for individuals with disabilities. Therefore, this research addresses the need for a robust system that can classify Kazakh banknotes. By capitalizing on the availability of smartphones and the ability to integrate detectors with classifiers this study introduces classifiers of Kazakh banknote images specifically designed for banknotes ranging from 500 KZT to 20,000 KZT. It compares traditional and hybrid machine learning (ML) approaches, utilizing a dataset of diverse banknote images, aiming for both lightweight and high accuracy. Competitive performance is demonstrated by the traditional approach, enhanced by thoughtful feature engineering. The hybrid approach, utilizing features from a pre-trained ResNet-18 model, showcases remarkable accuracy and robustness. Evaluation metrics reveal significant achievements, with the traditional approach attaining 94.00% accuracy and the hybrid approach excelling at 99.11%. Model stacking, combining classifiers from both approaches, outperforms individual classifiers, achieving 95.00% and 99.55% accuracy for the traditional and hybrid ML approaches, respectively. Our methodology’s comparable outcome in classifying Thai banknotes and coffee beans roasting levels demonstrates their versatility in image classification tasks that rely on color differentiation, showcasing the potential beyond banknote recognition.
- Research Article
8
- 10.1016/j.apgeochem.2023.105731
- Jun 27, 2023
- Applied Geochemistry
A chemistry-informed hybrid machine learning approach to predict metal adsorption onto mineral surfaces
- Research Article
2
- 10.1115/1.4064478
- Mar 5, 2024
- Journal of Computing and Information Science in Engineering
Vortex core detection remains an unsolved problem in the field of experimental and computational fluid dynamics. Available methods such as the Q, delta, and swirling strength criterion are based on a decomposed velocity gradient tensor but detect spurious vortices (false positives and false negatives), making these methods less robust. To overcome this, we propose a new hybrid machine learning approach in which we use a convolutional neural network to detect vortex regions within surface streamline plots and an additional deep neural network to detect vortex cores within identified vortex regions. Furthermore, we propose an automatic labeling approach based on K-means clustering to preprocess our input images. We show results for two classical test cases in fluid mechanics: the Taylor–Green vortex problem and two rotating blades. We show that our hybrid approach is up to 2.6 times faster than a pure deep neural network-based approach and furthermore show that our automatic K-means clustering labeling approach achieves within 0.45% mean square error of the more labour-intensive, manual labeling approach. At the same time, by using a sufficient number of samples, we show that we are able to reduce false positives and negatives entirely and thus show that our hybrid machine learning approach is a viable alternative to currently used vortex detection tools in fluid mechanics applications.
- Research Article
4
- 10.1177/03093247251337987
- May 10, 2025
- The Journal of Strain Analysis for Engineering Design
This study investigates the buckling behavior of columns with variable cross-sections using analytical, numerical, and hybrid machine learning (ML) approaches. Initially, the power series method is employed to calculate the buckling loads of columns with both constant and varying cross-sections under diverse boundary conditions. Then a finite element (FE) analyses of the columns are performed to obtain the buckling loads and the results are validate by comparing them with results from power series method. Once validated, the FE model is used to generate a large dataset encompassing a wide range of cross-sections, lengths, and material properties, as per the samples obtained through the Sobol sampling method. A hybrid ML model is then developed by integrating the XGBoost algorithm with the particle swarm optimization (PSO) technique for hyperparameter tuning. This hybrid PSO-XGBoost model is trained to predict the buckling loads of columns with varying cross-sections. Its performance for input parameters outside the training dataset is evaluated using statistical metrics and scatter plots. The results demonstrate excellent agreement between the FE analysis and the power series method, confirming the reliability of both approaches. The PSO-XGBoost model achieved remarkable predictive accuracy, with R 2 values of 0.999 and 0.996 for the training and testing datasets, respectively. Furthermore, SHapley Additive exPlanations (SHAP) analysis is conducted to explore the influence and interactions of input parameters on buckling loads, providing valuable insights into the model’s interpretability and the underlying mechanics of column buckling.
- Research Article
123
- 10.1109/access.2021.3062909
- Jan 1, 2021
- IEEE Access
A distributed denial of service (DDoS) attack represents a major threat to service providers. More specifically, a DDoS attack aims to disrupt and deny services to legitimate users by overwhelming the target with a massive number of malicious requests. A cyberattack of this kind is likely to result in tremendous economic losses for businesses and service providers due to increasing both operating and financial costs. In recent years, machine learning (ML) techniques have been widely used to prevent DDoS attacks. Indeed, many defense systems have been transformed into smart and intelligent systems through the use of ML techniques, which allow them to defeat DDoS attacks. This paper analyzes recent studies concerning DDoS detection methods that have adapted single and hybrid ML approaches in modern networking environments. Additionally, the paper discusses different DDoS defense systems based on ML techniques that make use of a virtualized environment, including cloud computing, software-defined network, and network functions virtualization environments. As the development of the Internet of Things (IoT) has been the subject of significant research attention in recent years, the paper also discusses ML approaches as security solutions against DDoS attacks in IoT environments. Furthermore, the paper recommends a number of directions for future research. This paper is intended to assist the research community with the design and development of effective defense systems capable of overcoming different types of DDoS attacks.
- Research Article
- 10.1080/10255842.2025.2584378
- Nov 4, 2025
- Computer Methods in Biomechanics and Biomedical Engineering
Smart health systems integrate advanced machine learning (ML) approaches to strengthen disease prediction, patient monitoring, and personalized healthcare solutions. They are founded on electronic health records (EHRs), mobile health (M-Health), and electronic medicine (E-Medicine) to process vast amounts of medical data in a timely and effective way. In this study, a new forecasting model is proposed employing Stacking Classifier (StackingC) and Bagging Classifier (BaggingC) models, which were optimized employing prairie dog optimization (PDO) and the sooty tern optimization algorithm (STOA). These optimization methods maximize attribute selection and model accuracy, ensuring strong and accurate forecasts. It is apparent from the results that StackingC is superior to BaggingC in overall accuracy at 0.979 against BaggingC at 0.958. Although BaggingC performed better at training accuracy (0.961), StackingC performed better at overall generalization when tested (0.942). We further introduce a sensitivity analysis of the best hybrid models, STPD and STST, to demonstrate their consistency in risk prediction. STPD performed the best at overall accuracy at 0.985, followed by STST at 0.980. These findings testify to the excellence of hybrid ML approaches in intelligent healthcare situations to ensure improved patient outcomes through accurate and robust prediction. This study contributes to predictive analytics in health care by optimizing model and sensitivity analysis approaches.
- Research Article
50
- 10.1016/j.cscm.2023.e02723
- Nov 29, 2023
- Case Studies in Construction Materials
Ultra-high-performance concrete (UHPC) is a sustainable construction material; it can be applied as a substitute for cement concrete. Artificial intelligence methods have been used to evaluate concrete composites to reduce time and money in the construction industries. So, this study applied machine learning (ML) and hybrid ML approaches to predict the compressive and flexural strength of UHPC. A dataset of 626 compressive strength and 317 flexural strength data points was collected from the published research articles, where fourteen important variables were selected as input parameters for the analysis of hybrid ML and ML algorithms. This research used XGBoost, LightGBM, and hybrid XGBoost- LightGBM algorithms to predict UHPC materials. Grid search (GS) techniques were used to adjust model hyper-parameters in search of improved high accuracy and efficiency. ML and hybrid ML models were train, and the test stage utilized statistical assessments such as coefficient of determination (R-square), root mean square error (RMSE), mean absolute error (MAE), and coefficient of efficiency (CE). The results presented hybrid ML algorithm was superior to the XGBoost and LightGBM algorithms in terms of R-square and RMSE values for both compressive and flexural strength prediction. A hybrid ML model and two ML models showed CS considerable R-square values above 0.94 at the testing stages and just over 0.97 at the training phase. Hybrid ML model performance accuracy for CS prediction R-square value found that almost 0.996 for training and 0.963 for testing phases. At the same time, the FS prediction result showed that the R-square value of the Hybrid ML model and two traditional ML models were found at almost 0.95 for the training phase and around 0.81 for the testing phase. But among them, the hybrid XGB-LGB model prediction performance was high accuracy and lowest error for CS and FS of UHPC trained and its hyperparameters optimized. Additionally, the SHAP investigation reveals the impact and relationship of the input variables with the output variables. SHAP analysis outcome reveals that curing age and steel fiber content input parameter had the highest positive impact on compressive strength and flexural strength of UHPC.
- Research Article
- 10.3390/info16090730
- Aug 25, 2025
- Information
The growing complexity and size of healthcare systems have rendered fraud detection increasingly challenging; however, the current literature lacks a holistic view of the latest machine learning (ML) techniques with practical implementation concerns. The present study addresses this gap by highlighting the importance of machine learning (ML) in preventing and mitigating healthcare fraud, evaluating recent advancements, investigating implementation barriers, and exploring future research dimensions. To further address the limited research on the evaluation of machine learning (ML) and hybrid approaches, this study considers a broad spectrum of ML techniques, including supervised ML, unsupervised ML, deep learning, and hybrid ML approaches such as SMOTE-ENN, explainable AI, federated learning, and ensemble learning. The study also explored their potential use in enhancing fraud detection in imbalanced and multidimensional datasets. A significant finding of the study was the identification of commonly employed datasets, such as Medicare, the List of Excluded Individuals and Entities (LEIE), and Kaggle datasets, which serve as a baseline for evaluating machine learning (ML) models. The study’s findings comprehensively identify the challenges of employing machine learning (ML) in healthcare systems, including data quality, system scalability, regulatory compliance, and resource constraints. The study provides actionable insights, such as model interpretability to enable regulatory compliance and federated learning for confidential data sharing, which is particularly relevant for policymakers, healthcare providers, and insurance companies that intend to deploy a robust, scalable, and secure fraud detection infrastructure. The study presents a comprehensive framework for enhancing real-time healthcare fraud detection through self-learning, interpretable, and safe machine learning (ML) infrastructures, integrating theoretical advancements with practical application needs.
- Conference Article
2
- 10.1145/3330430.3333649
- Jun 24, 2019
Relatedness between user input and an ideal response is a salient feature required for proper functioning of an Intelligent Tutoring System (ITS) using natural language processing. Improper assessment of text input causes maladaptation in ITSs. Meta-assessment of user responses in ITSs can improve instruction efficacy and user satisfaction. Therefore, this paper evaluates the quality of semantic matching between user input and the expected response in AutoTutor, an ITS which holds a conversation with the user in natural language. AutoTutor's dialogue is driven by the AutoTutor Conversation Engine (ACE), which uses a combination of Latent Semantic Analysis (LSA) and Regular Expressions (RegEx) to assess user input. We assessed ACE via responses from 219 Amazon Mechanical Turk users, who answered 118 electronics questions broken into 5202 response pairings (n = 5202). These analyses explore the relationship between RegEx and LSA, agreement between the two judges, and agreement between human judges and ACE. Additionally, we calculated precision and recall. As expected, regular expressions and LSA had a moderate, positive relationship, and the agreement between ACE and human was fair, but slightly lower than agreement between human.
- Research Article
14
- 10.4067/s0718-09342005000300004
- Jan 1, 2005
- Revista Signos
Este artículo presenta la combinación de Análisis Semántico Latente (LSA) con otras técnicas de procesamiento del lenguaje natural (lematización, eliminación de palabras funcionales y desambiguación de sentidos) para mejorar la evaluación automática de respuestas en texto libre. El sistema de evaluación de respuestas en texto libre llamado Atenea (Alfonseca & Pérez, 2004) ha servido de marco experimental para probar el esquema combinacional. Atenea es un sistema capaz de realizar preguntas, escogidas aleatoriamente o bien conforme al perfil del estudiante, y asignarles una calificación numérica. Los resultados de los experimentos demuestran que para todos los conjuntos de datos en los que las técnicas de PLN se han combinado con LSA la correlación de Pearson entre las notas dadas por Atenea y las notas dadas por los profesores para el mismo conjunto de preguntas mejora. La causa puede encontrarse en la complementariedad entre LSA, que trabaja a un nivel semántico superficial, y el resto de las técnicas NLP usadas en Atenea, que están más centradas en los niveles léxico y sintáctico.
- Research Article
75
- 10.3390/math10091480
- Apr 28, 2022
- Mathematics
The negative effect of financial crimes on financial institutions has grown dramatically over the years. To detect crimes such as credit card fraud, several single and hybrid machine learning approaches have been used. However, these approaches have significant limitations as no further investigation on different hybrid algorithms for a given dataset were studied. This research proposes and investigates seven hybrid machine learning models to detect fraudulent activities with a real word dataset. The developed hybrid models consisted of two phases, state-of-the-art machine learning algorithms were used first to detect credit card fraud, then, hybrid methods were constructed based on the best single algorithm from the first phase. Our findings indicated that the hybrid model Adaboost + LGBM is the champion model as it displayed the highest performance. Future studies should focus on studying different types of hybridization and algorithms in the credit card domain.
- Conference Article
2
- 10.1109/icmlc.2005.1527306
- Jan 1, 2005
This paper presents a new method of hierarchical text clustering based on combination of latent semantic analysis (LSA) and hierarchical TGSOM, which is called TCBLHT method. The text clustering result using traditional methods cannot show hierarchical structure, however, the hierarchical structure is very important in text clustering. The TCBLHT method can automatically achieve hierarchical text clustering, and establishes vector space model (VSM) of term weight by using the theory of LSA, then semantic relation is included in the vector space model. Both theory analysis and experimental results confirm that TCBLHT method decreases the number of vector, and enhances the efficiency and precision of text clustering.
- Research Article
1
- 10.5539/mas.v3n9p72
- Aug 17, 2009
- Modern Applied Science
Text clustering has been recognized as an important component in data mining. Self-Organizing Map (SOM) based models have been found to have certain advantages for clustering sizeable text data. However, current existing approaches lack in providing an adaptive hierarchical structure within in a single model. This paper presents a new method of hierarchical text clustering based on combination of latent semantic analysis (LSA) and hierarchical GSOM, which is called LSA-HGSOM method. The text clustering result using traditional methods can not show hierarchical structure. However, the hierarchical structure is very important in text clustering. The LSA-HGSOM method can automatically achieve hierarchical text clustering, and establishes vector space model (VSM) of term weight by using the theory of LSA, then semantic relation is included in the vector space model. Both theory analysis and experimental results confirm that LSA-HGSOM method decreases the number of vector, and enhances the efficiency and precision of text clustering.
- Book Chapter
2
- 10.1007/978-3-642-23982-3_1
- Jan 1, 2011
Text clustering has been recognized as an important component in data mining. Self-Organizing Map (SOM) based models have been found to have certain advantages for clustering sizeable text data. However, current existing approaches lack in providing an adaptive hierarchical structure within in a single model. This paper presents a new method of hierarchical text clustering based on combination of latent semantic analysis (LSA) and hierarchical GSOM, which is called LSA-HGSOM method. The text clustering result using traditional methods can not show hierarchical structure. However, the hierarchical structure is very important in text clustering. The LSA-HGSOM method can automatically achieve hierarchical text clustering, and establishes vector space model (VSM) of term weight by using the theory of LSA, then semantic relation is included in the vector space model. Both theory analysis and experimental results confirm that LSA-HGSOM method decreases the number of vector, and enhances the efficiency and precision of text clustering.
- Conference Article
11
- 10.2118/218562-ms
- Apr 22, 2024
The growth of machine learning (ML) approaches has sparked innovations in many applications including hydraulic fracturing design. The crucial drawback in these models is the subjectivity and expertise of the design engineers, which could risk under-realizing the true reservoir and production potential. To overcome this, we incorporate the physics of fracturing design theory into ML models through a hybridized approach. A method consolidating complete physics that integrated reservoir characteristics, fracturing diagnostics, and production performance was applied to 71 parameters of which 22 were generated randomly with practical minimum-maximum ranges and 49 were generated using empirical and analytical correlations. The inputs included reservoir rock and fluid properties, fracturing fluid, proppant and treatment parameters, and fracture conductivity results. The dataset was built so that only two outputs from the analysis of a small injection/falloff test were required: transmissibility from the after-closure analysis and the net pressure. The final model outputs included crosslinked fluid efficiency, pad percent for safe mode and tip screenout mode, proppant mass, maximum allowable proppant concentration, and dimensionless productivity index. The ML model also has a genetic algorithm optimizer loop downstream to optimize the fracturing treatment design to maximize the production. The approach yielded a broad range of output values, and 10,000 rows of the dataset were finalized. The dataset is also appended with the optimized dimensionless fracture conductivity and dimensionless productivity index calculated with the classical boundary element routine. This synthetically constructed dataset was then subjected to a feed-forward neural network to generate data-based models after tuning the hyperparameters. The multilayer perceptron model was used here and all variables provided coupled performance metric. Root mean square error, mean absolute percentage error, and coefficient of determination were used as performance metrics and showed the model significance with values of 0.16, 0.77, and 0.96, respectively. The trained model is a backbone to be used to solve with iterative updates of a small real-field dataset. The cost functions of predictors can be optimized by tuning the hyperparameters, which are generated with the governing equations for fluid flow through porous media, fluid leakoff, and fracturing theory presented in the literature guided by specific field data. A comparison is also performed using the same performance metrics on a small real-field dataset using a purely data-driven (classification) ML approach versus this hybrid ML approach, where the latter shows significant improvement in predictions. Physics-based ML gives the advantage of intrinsic causality in the synthetic dataset. Transfer predictive learning opens an array of opportunities for small data utilization. The method bolsters full-scale deep-learning model creation in fracturing and in similar domains where limited records are available.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.