Time series forecasting for bug resolution using machine learning and deep learning models
Predicting bug fix times is a key objective for improving software maintenance and supporting planning in open source projects. In this study, we evaluate the effectiveness of different time series forecasting models applied to real-world data from multiple repositories, comparing local (one model per project) and global (a single model trained across multiple projects) approaches. We considered classical models (Naive, Linear Regression, Random Forest) and neural networks (MLP, LSTM, GRU), with global extensions including Random Forest and LSTM with project embeddings. The results highlight that, at the local level, Random Forest achieves lower errors and better classification metrics than deep learning models in several cases. However, global models show greater robustness and generalizability: in particular, the global Random Forest significantly reduces the mean error and maintains high performance in terms of accuracy and F1 score, while the global LSTM captures temporal dependencies and provides additional insights into cross-project dynamics. The explainable AI techniques adopted (permutation importance, saliency maps, and embedding analysis) allow us to interpret the main drivers of forecasts, confirming the role of process variables and temporal characteristics. Overall, the study demonstrates that an integrated approach, combining classical models and deep learning in a global perspective, offers more reliable and interpretable forecasts to support software maintenance.
- Research Article
3
- 10.1371/journal.pone.0316919
- Jan 17, 2025
- PLOS ONE
PurposeIn this study, we investigated the performance of deep learning (DL) models to differentiate between normal and glaucomatous visual fields (VFs) and classify glaucoma from early to the advanced stage to observe if the DL model can stage glaucoma as Mills criteria using only the pattern deviation (PD) plots. The DL model results were compared with a machine learning (ML) classifier trained on conventional VF parameters.MethodsA total of 265 PD plots and 265 numerical datasets of Humphrey 24–2 VF images were collected from 119 normal and 146 glaucomatous eyes to train the DL models to classify the images into four groups: normal, early glaucoma, moderate glaucoma, and advanced glaucoma. The two popular pre-trained DL models: ResNet18 and VGG16, were used to train the PD images using five-fold cross-validation (CV) and observed the performance using balanced, pre-augmented data (n = 476 images), imbalanced original data (n = 265) and feature extraction. The trained images were further investigated using the Grad-CAM visualization technique. Moreover, four ML models were trained from the global indices: mean deviation (MD), pattern standard deviation (PSD) and visual field index (VFI), using five-fold CV to compare the classification performance with the DL model’s result.ResultsThe DL model, ResNet18 trained from balanced, pre-augmented PD images, achieved high accuracy in classifying the groups with an overall F1-score: 96.8%, precision: 97.0%, recall: 96.9%, and specificity: 99.0%. The highest F1 score was 87.8% for ResNet18 with the original dataset and 88.7% for VGG16 with feature extraction. The DL models successfully localized the affected VF loss in PD plots. Among the ML models, the random forest (RF) classifier performed best with an F1 score of 96%.ConclusionThe DL model trained from PD plots was promising in differentiating normal and glaucomatous groups and performed similarly to conventional global indices. Hence, the evidence-based DL model trained from PD images demonstrated that the DL model could stage glaucoma using only PD plots like Mills criteria. This automated DL model will assist clinicians in precision glaucoma detection and progression management during extensive glaucoma screening.
- Research Article
12
- 10.1038/s41598-024-66481-4
- Jul 8, 2024
- Scientific Reports
The need for intubation in methanol-poisoned patients, if not predicted in time, can lead to irreparable complications and even death. Artificial intelligence (AI) techniques like machine learning (ML) and deep learning (DL) greatly aid in accurately predicting intubation needs for methanol-poisoned patients. So, our study aims to assess Explainable Artificial Intelligence (XAI) for predicting intubation necessity in methanol-poisoned patients, comparing deep learning and machine learning models. This study analyzed a dataset of 897 patient records from Loghman Hakim Hospital in Tehran, Iran, encompassing cases of methanol poisoning, including those requiring intubation (202 cases) and those not requiring it (695 cases). Eight established ML (SVM, XGB, DT, RF) and DL (DNN, FNN, LSTM, CNN) models were used. Techniques such as tenfold cross-validation and hyperparameter tuning were applied to prevent overfitting. The study also focused on interpretability through SHAP and LIME methods. Model performance was evaluated based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior performance in accuracy (94.0%), sensitivity (99.0%), specificity (94.0%), and F1-score (97.0%). CNN led in ROC with 78.0%. For ML models, RF excelled in accuracy (97.0%) and specificity (100%), followed by XGB with sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%). Overall, RF and XGB outperformed other models, with accuracy (97.0%) and specificity (100%) for RF, and sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%) for XGB. ML models surpassed DL models across all metrics, with accuracies from 93.0% to 97.0% for DL and 93.0% to 99.0% for ML. Sensitivities ranged from 98.0% to 99.37% for DL and 93.0% to 99.0% for ML. DL models achieved specificities from 78.0% to 94.0%, while ML models ranged from 93.0% to 100%. F1-scores for DL were between 93.0% and 97.0%, and for ML between 96.0% and 98.27%. DL models scored ROC between 68.0% and 78.0%, while ML models ranged from 84.0% to 96.08%. Key features for predicting intubation necessity include GCS at admission, ICU admission, age, longer folic acid therapy duration, elevated BUN and AST levels, VBG_HCO3 at initial record, and hemodialysis presence. This study as the showcases XAI's effectiveness in predicting intubation necessity in methanol-poisoned patients. ML models, particularly RF and XGB, outperform DL counterparts, underscoring their potential for clinical decision-making.
- Research Article
8
- 10.1007/s11356-024-35764-8
- Jan 1, 2025
- Environmental Science and Pollution Research
Human-induced global warming, primarily attributed to the rise in atmospheric CO2, poses a substantial risk to the survival of humanity. While most research focuses on predicting annual CO2 emissions, which are crucial for setting long-term emission mitigation targets, the precise prediction of daily CO2 emissions is equally vital for setting short-term targets. This study examines the performance of 14 models in predicting daily CO2 emissions data from 1/1/2022 to 30/9/2023 across the top four polluting regions (China, India, the USA, and the EU27&UK). The 14 models used in the study include four statistical models (ARMA, ARIMA, SARMA, and SARIMA), three machine learning models (support vector machine (SVM), random forest (RF), and gradient boosting (GB)), and seven deep learning models (artificial neural network (ANN), recurrent neural network variations such as gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional-LSTM (BILSTM), and three hybrid combinations of CNN-RNN). Performance evaluation employs four metrics (R2, MAE, RMSE, and MAPE). The results show that the machine learning (ML) and deep learning (DL) models, with higher R2 (0.714–0.932) and lower RMSE (0.480–0.247) values, respectively, outperformed the statistical model, which had R2 (− 0.060–0.719) and RMSE (1.695–0.537) values, in predicting daily CO2 emissions across all four regions. The performance of the ML and DL models was further enhanced by differencing, a technique that improves accuracy by ensuring stationarity and creating additional features and patterns from which the model can learn. Additionally, applying ensemble techniques such as bagging and voting improved the performance of the ML models by approximately 9.6%, whereas hybrid combinations of CNN-RNN enhanced the performance of the RNN models. In summary, the performance of both the ML and DL models was relatively similar. However, due to the high computational requirements associated with DL models, the recommended models for daily CO2 emission prediction are ML models using the ensemble technique of voting and bagging. This model can assist in accurately forecasting daily emissions, aiding authorities in setting targets for CO2 emission reduction.
- Abstract
- 10.1016/j.ijrobp.2022.07.946
- Oct 22, 2022
- International Journal of Radiation Oncology*Biology*Physics
Comparison of Machine Learning and Deep Learning Methods for the Prediction of Osteoradionecrosis Resulting from Head and Neck Cancer Radiation Therapy
- Research Article
- 10.3390/risks13050099
- May 20, 2025
- Risks
In emerging markets like Vietnam, where student borrowers often lack traditional credit histories, accurately predicting loan eligibility remains a critical yet underexplored challenge. While machine learning and deep learning techniques have shown promise in credit scoring, their comparative performance in the context of student loans has not been thoroughly investigated. This study aims to evaluate and compare the predictive effectiveness of four supervised learning models—such as Random Forest, Gradient Boosting, Support Vector Machine, and Deep Neural Network (implemented with PyTorch version 2.6.0)—in forecasting student credit eligibility. Primary data were collected from 1024 university students through structured surveys covering academic, financial, and personal variables. The models were trained and tested on the same dataset and evaluated using a comprehensive set of classification and regression metrics. The findings reveal that each model exhibits distinct strengths. Deep Learning achieved the highest classification accuracy (85.55%), while random forest demonstrated robust performance, particularly in providing balanced results across classification metrics. Gradient Boosting was effective in recall-oriented tasks, and support vector machine demonstrated strong precision for the positive class, although its recall was lower compared to other models. The study highlights the importance of aligning model selection with specific application goals, such as prioritizing accuracy, recall, or interpretability. It offers practical implications for financial institutions and universities in developing machine learning and deep learning tools for student loan eligibility prediction. Future research should consider longitudinal data, behavioral factors, and hybrid modeling approaches to further optimize predictive performance in educational finance.
- Research Article
8
- 10.1016/j.adro.2022.101163
- Dec 27, 2022
- Advances in Radiation Oncology
Comparison of Machine-Learning and Deep-Learning Methods for the Prediction of Osteoradionecrosis Resulting From Head and Neck Cancer Radiation Therapy
- Research Article
9
- 10.1038/s41598-024-82931-5
- Dec 28, 2024
- Scientific Reports
Failure to predict stroke promptly may lead to delayed treatment, causing severe consequences like permanent neurological damage or death. Early detection using deep learning (DL) and machine learning (ML) models can enhance patient outcomes and mitigate the long-term effects of strokes. The aim of this study is to compare these models, exploring their efficacy in predicting stroke. This study analyzed a dataset comprising 663 records from patients hospitalized at Hazrat Rasool Akram Hospital in Tehran, Iran, including 401 healthy individuals and 262 stroke patients. A total of eight established ML (SVM, XGB, KNN, RF) and DL (DNN, FNN, LSTM, CNN) models were utilized to predict stroke. Techniques such as 10-fold cross-validation and hyperparameter tuning were implemented to prevent overfitting. The study also focused on interpretability through Shapley Additive Explanations (SHAP). The evaluation of model’s performance was based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior sensitivity at 96.15%, while FNN exhibited better specificity (96.0%), accuracy (96.0%), F1-score (95.0%), and ROC (98.0%) among DL models. For ML models, RF displayed higher sensitivity (99.9%), accuracy (99.0%), specificity (100%), F1-score (99.0%), and ROC (99.9%). Overall, RF outperformed all models, while DL models surpassed ML models in most metrics except for RF. DL models (CNN, LSTM, DNN, FNN) achieved sensitivities from 93.0 to 96.15%, specificities from 80.0 to 96.0%, accuracies from 92.0 to 96.0%, F1-scores from 87.34 to 95.0%, and ROC scores from 95.0 to 98.0%. In contrast, ML models (KNN, XGB, SVM) showed sensitivities between 29.0% and 94.0%, specificities between 89.47% and 96.0%, accuracies between 71.0% and 95.0%, F1-scores between 44.0% and 95.0%, and ROC scores between 64.0% and 95.0%. This study demonstrates the efficacy of DL and ML models in predicting stroke, with the RF models outperforming all others in key metrics. While DL models generally surpassed ML models, RF’s exceptional performance highlights the potential of combining these technologies for early stroke detection, significantly improving patient outcomes by preventing severe consequences like permanent neurological damage or death.
- Research Article
- 10.37547/marketing-fmmej-05-02-02
- Feb 3, 2025
- Frontline Marketing, Management and Economics Journal
This study explores the use of machine learning (ML) and deep learning (DL) models for predicting stock price movements through sentiment analysis of financial news articles. Four models were evaluated: Random Forest (RF), Gradient Boosting (GB), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representations from Transformers (BERT). The results showed that deep learning models, particularly BERT, outperformed traditional ML models, achieving higher accuracy, precision, recall, and F1 scores. BERT’s ability to capture contextual relationships in text proved superior in handling the complexities of financial news. This research highlights the effectiveness of sentiment analysis in stock market prediction and suggests that advanced ML and DL techniques can enhance forecasting accuracy. Future work could focus on refining these models by integrating more data sources and exploring hybrid approaches.
- Research Article
- 10.17485/ijst/v17i45.2728
- Dec 14, 2024
- Indian Journal Of Science And Technology
Objectives: To evaluate the efficiency of task prediction and resource allocation for load balancing (LB) in the cloud environment using the combined approach like random Forest(RF) for task prediction and Particle Swarm optimization for optimization and Convolutional Neural Networks (PSO-CNN) for resource prediction and allocation. Methods: The ensemble approach in the present study uses Random Forest (RF), a machine learning (ML) model for task prediction and Particle Swarm Optimization (PSO+CNN), a bio-inspired algorithm and Deep Learning (DL) model for optimization and resource allocation. The study employs PSO techniques to optimize CNN in order to address the investigation of algorithmic optimization in DL. The results show that the suggested model outperforms the other models like CNN-LSTM(Long Short-term memory), CNN-GRU(Gated Recurrent Unit), and PSO –SVM(Support Vector Machine) to increase the performance and efficacy of the cloud systems. The experiment is implemented using Python and assessed using Google Cluster dataset that is accessible to the public. Findings: The use of ML and DL techniques are found to be more efficient in cloud infrastructure than the conventional methods. The study examines the performance of the RF, PSO and CNN and the hybrid RF-PSO-CNN models. The accuracy, precision, and F1. Score metrics were used to assess the performance of the classification models. The recommended model RF-PSO-CNN outperforms them with an accuracy of 90% than the contrasted methods like CNN-LSTM, CNN- GRU and PSO-SVM. As a result, both the classification assessment metrics and resource consumption show that the proposed model performs effectively. Novelty: The novel ensemble approach suggests the combined RF-PSO-CNN for LB in cloud Computing. The task predicted by RF is assigned to the resource chosen by PSO and CNN, thereby improving the efficiency of task prediction and resource allocation. Most of the research uses any two ML or DL methods for either predicting the tasks to be scheduled or which resource to allocate. The study uses a combination of the ML (RF) method, bio-inspired algorithm (PSO) and a DL (CNN) model for both task and resource prediction concurrently and it examines the effectiveness of LB in the cloud context. Keywords: Load Balancing (LB), Task scheduling, Resource allocation, Random Forest (RF), Convolutional Neural Networks (CNN), Particle Swarm Optimization (PSO)
- Conference Article
3
- 10.1109/pcems58491.2023.10136087
- Apr 5, 2023
Fake news refers to misleading or fake information spread over the internet or other communication networks. In our paper, we use different machine learning (ML) models and deep learning (DL) models for classifying news as fake or real. The different ML models used are k-nearest neighbor (KNN), random forest (RF), logistic regression, naive Bayes, and DL models like long short-term memory (LSTM), and gated recurrent units (GRU) for prediction. We developed a mechanism that combines the prediction probabilities of ML models and DL models for prediction. We achieved accuracy as high as 0.98 and F1 scores as high as 0.98 using our approach. We also analyze the results of classification using different graphs which give us meaningful insights into the accuracy of the prediction of different models. We use flow charts to demonstrate the flow of our proposed algorithm in the classification of news. The superiority of our model is demonstrated in experimental results.
- Research Article
127
- 10.1016/j.asoc.2023.110534
- Jun 22, 2023
- Applied Soft Computing
Leaf disease detection using machine learning and deep learning: Review and challenges
- Research Article
22
- 10.1155/2022/5849995
- Feb 24, 2022
- Computational Intelligence and Neuroscience
Heart failure is the most common cause of death in both males and females around the world. Cardiovascular diseases (CVDs), in particular, are the main cause of death worldwide, accounting for 30% of all fatalities in the United States and 45% in Europe. Artificial intelligence (AI) approaches such as machine learning (ML) and deep learning (DL) models are playing an important role in the advancement of heart failure therapy. The main objective of this study was to perform a network meta-analysis of patients with heart failure, stroke, hypertension, and diabetes by comparing the ML and DL models. A comprehensive search of five electronic databases was performed using ScienceDirect, EMBASE, PubMed, Web of Science, and IEEE Xplore. The search strategy was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. The methodological quality of studies was assessed by following the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) guidelines. The random-effects network meta-analysis forest plot with categorical data was used, as were subgroups testing for all four types of treatments and calculating odds ratio (OR) with a 95% confidence interval (CI). Pooled network forest, funnel plots, and the league table, which show the best algorithms for each outcome, were analyzed. Seventeen studies, with a total of 285,213 patients with CVDs, were included in the network meta-analysis. The statistical evidence indicated that the DL algorithms performed well in the prediction of heart failure with AUC of 0.843 and CI [0.840–0.845], while in the ML algorithm, the gradient boosting machine (GBM) achieved an average accuracy of 91.10% in predicting heart failure. An artificial neural network (ANN) performed well in the prediction of diabetes with an OR and CI of 0.0905 [0.0489; 0.1673]. Support vector machine (SVM) performed better for the prediction of stroke with OR and CI of 25.0801 [11.4824; 54.7803]. Random forest (RF) results performed well in the prediction of hypertension with OR and CI of 10.8527 [4.7434; 24.8305]. The findings of this work suggest that the DL models can effectively advance the prediction of and knowledge about heart failure, but there is a lack of literature regarding DL methods in the field of CVDs. As a result, more DL models should be applied in this field. To confirm our findings, more meta-analysis (e.g., Bayesian network) and thorough research with a larger number of patients are encouraged.
- Research Article
24
- 10.1016/j.envc.2021.100053
- Feb 19, 2021
- Environmental Challenges
Artisanal and Small-Scale Mining (ASM) landscapes form integral part of the Land use land cover (LULC) in the developing worlds. However, the spatial, spectral, and temporal footprints of ASM present some challenges for using most of the freely available optical satellite sensors for change analysis. The challenge is even profound in tropical West African countries like Ghana where there is prolonged cloud cover. Whiles very few studies have used Sentinel-2 data to map change analysis in ASM landscape, none examined the contribution of individual S2 bands to the ASM classifications. Also, despite the capabilities of Machine Learning (ML) and Deep Learning (DL) models for LULC classifications, few studies have compared the performances of different classifiers in mapping ASM landscape. This study utilized Sentinel-2 data, four ML and DL models (Artificial Neural Network –ANN, Random Forest – RF, Support Vector Machines –SVM, a pixel-based Convolutional Neural Network-CNN) and image segmentation to examine the performance of S2 bands and ML and DL algorithms for change analysis in ASM landscape, with the Birim Basin in Ghana as a study area. The result of the change analysis was used to assess changes in LULC during the recent ban on the expansion of ASM in the country. It was found out that ANN is a better classifier of ASM achieving the highest overall accuracy (OA) of 99.80% on the segmented Sentinel-2 bands. The study also found out that the Band 5 Vegetation Red Edge (VRE) 1 contributed most to classifying ASM, with the segmented VRE 1 being superlative over the other predictors. In terms of expansion, ASM increased by 59.17 km2 within the period of the study (January 2017 to December 2018), suggesting that ASM still took place under the watch of the ban. The classification results showed that most of the peripheral of forest and farmland have been converted to ASM with little disturbance within the interior of the forest reserves. The study revealed that, the ban was yielding very little or no results due to a number of policy deficiencies including low staff strength, lack of logistics and low remuneration. Enforcement of legal instruments against ASM and farming activities within the forest reserves, improvement in the monitoring systems and intensification of public education on the value of forest and the need to protect it are some of the major recommendations that could control encroachment on the forest reserves.
- Research Article
- 10.54254/2755-2721/86/20241566
- Jul 31, 2024
- Applied and Computational Engineering
This paper highlights the shift from classical machine learning to deep learning models and describes the fundamental methodology and developments in pedestrian activity recognition. The four main steps of the workflow are the gathering of datasets, pre-processing, designing and training the model, and evaluating the outcome. To extract pedestrian feature vectors, data must first be collected, cleaned, and processed from public or proprietary datasets. These vectors are used to train deep learning or machine learning models, which are subsequently assessed and fine-tuned for use in practical applications such as behaviour analysis and surveillance. For action recognition, conventional machine learning techniques like Random Forests (RF) and Support Vector Machines (SVM) have been used. SVMs, despite their potential for computing complexity, identify the best hyperplanes for classification. The categorization rates for a variety of human behaviours have been enhanced by a combination strategy utilizing SVMs and decision trees. As shown in a study that uses smartphone accelerometers to accurately identify everyday activities, RFs can manage enormous datasets. Deep learning models that automatically learn complicated feature representations, such as VA-fusion, AGC-LSTM, and LC-POSEGAIT, provide improved performance. These models capture minute differences in pedestrian behaviour using CNNs, RNNs, and LSTM architectures. Interpretability, generalization to new datasets, and computing demands are some of the difficulties they encounter. Future developments could include using transfer learning to improve performance in many circumstances, combining deep learning and expert systems for improved interpretability, and utilizing distributed computing for processing in an effective manner.
- Research Article
- 10.1007/s42452-021-04162-x
- Jan 20, 2021
- SN Applied Sciences
In an open source software development environment, it is hard to decide the number of group members required for resolving software issues. Developers generally reply to issues based totally on their domain knowledge and interest, and there are no predetermined groups. The developers openly collaborate on resolving the issues based on many factors, such as their interest, domain expertise, and availability. This study compares eight different algorithms employing machine learning and deep learning, namely—Convolutional Neural Network, Multilayer Perceptron, Classification and Regression Trees, Generalized Linear Model, Bayesian Additive Regression Trees, Gaussian Process, Random Forest and Conditional Inference Tree for predicting group size in five open source software projects developed and managed using an open source development framework GitHub. The social information foraging model has also been extended to predict group size in software issues, and its results compared to those obtained using machine learning and deep learning algorithms. The prediction results suggest that deep learning and machine learning models predict better than the extended social information foraging model, while the best-ranked model is a deep multilayer perceptron((R.M.S.E. sequelize—1.21, opencv—1.17, bitcoin—1.05, aseprite—1.01, electron—1.16). Also it was observed that issue labels helped improve the prediction performance of the machine learning and deep learning models. The prediction results of these models have been used to build an Issue Group Recommendation System as an Internet of Things application that recommends and alerts additional developers to help resolve an open issue.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.