A Comparative Analysis of Metaheuristic Feature Selection Methods in Software Vulnerability Prediction
Background: Early identification of software vulnerabilities is an intrinsic step in achieving software security. In the era of artificial intelligence, software vulnerability prediction models (VPMs) are created using machine learning and deep learning approaches. The effectiveness of these models aids in increasing the quality of the software. The handling of imbalanced datasets and dimensionality reduction are important aspects that affect the performance of VPMs. Aim: The current study applies novel metaheuristic approaches for feature subset selection. Method: This paper performs a comparative analysis of forty-eight combinations of eight machine learning techniques and six metaheuristic feature selection methods on four public datasets. Results: The experimental results reveal that VPMs productivity is upgraded after the application of the feature selection methods for both metrics-based and text-mining-based datasets. Additionally, the study has applied Wilcoxon signed-rank test to the results of metrics-based and text-features-based VPMs to evaluate which outperformed the other. Furthermore, it discovers the best-performing feature selection algorithm based on AUC for each dataset. Finally, this paper has performed better than the benchmark studies in terms of F1-Score. Conclusion: The results conclude that GWO has performed satisfactorily for all the datasets.
- Research Article
6
- 10.1002/smr.2164
- Apr 22, 2019
- Journal of Software: Evolution and Process
Software vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text‐based features are more useful, and ensemble‐based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance.
- Research Article
16
- 10.1016/j.infsof.2019.106204
- Nov 5, 2019
- Information and Software Technology
Better together: Comparing vulnerability prediction models
- Research Article
5
- 10.4018/ijamc.292508
- Jan 14, 2022
- International Journal of Applied Metaheuristic Computing
Any vulnerability in the software creates a software security threat and helps hackers to gain unauthorized access to resources. Vulnerability prediction models help software engineers to effectively allocate their resources to find any vulnerable class in the software, before its delivery to customers. Vulnerable classes must be carefully reviewed by security experts and tested to identify potential threats that may arise in the future. In the present work, a novel technique based on Grey wolf algorithm and Random forest is proposed for software vulnerability prediction. Grey wolf technique is a metaheuristic technique and it is used to select the best subset of features. The proposed technique is compared with other machine learning techniques. Experiments were performed on three datasets available publicly. It was observed that our proposed technique (GW-RF) outperformed all other techniques for software vulnerability prediction.
- Research Article
- 10.1016/j.iot.2024.101367
- Sep 7, 2024
- Internet of Things
This paper delves into the critical need for enhanced security measures within the Internet of Things (IoT) landscape due to inherent vulnerabilities in IoT devices, rendering them susceptible to various forms of cyber-attacks. The study emphasizes the importance of Intrusion Detection Systems (IDS) for continuous threat monitoring. The objective of this study was to conduct a comprehensive evaluation of feature selection (FS) methods using various machine learning (ML) techniques for classifying traffic flows within datasets containing intrusions in IoT environments. An extensive benchmark analysis of ML techniques and FS methods was performed, assessing feature selection under different approaches including Filter Feature Ranking (FFR), Filter-Feature Subset Selection (FSS), and Wrapper-based Feature Selection (WFS). FS becomes pivotal in handling vast IoT data by reducing irrelevant attributes, addressing the curse of dimensionality, enhancing model interpretability, and optimizing resources in devices with limited capacity. Key findings indicate the outperformance for traffic flows classification of certain tree-based algorithms, such as J48 or PART, against other machine learning techniques (naive Bayes, multi-layer perceptron, logistic, adaptive boosting or k-Nearest Neighbors), showcasing a good balance between performance and execution time. FS methods' advantages and drawbacks are discussed, highlighting the main differences in results obtained among different FS approaches. Filter-feature Subset Selection (FSS) approaches such as CFS could be more suitable than Filter Feature Ranking (FFR), which may select correlated attributes, or than Wrapper-based Feature Selection (WFS) methods, which may tailor attribute subsets for specific ML techniques and have lengthy execution times. In any case, reducing attributes via FS has allowed optimization of classification without compromising accuracy. In this study, F1 score classification results above 0.99, along with a reduction of over 60% in the number of attributes, have been achieved in most experiments conducted across four datasets, both in binary and multiclass modes. This work emphasizes the importance of a balanced attribute selection process, taking into account threat detection capabilities and computational complexity.
- Conference Article
16
- 10.1109/ase.2017.8115724
- Oct 1, 2017
Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security using traceable patterns and software metrics during development. The concept of traceable patterns is similar to design patterns but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. This study explores the performance of some code patterns in vulnerability prediction and compares them with traditional software metrics. We use the findings to build an effective vulnerability prediction model. We evaluate security vulnerabilities reported for Apache Tomcat, Apache CXF and three stand-alone Java web applications. We use machine learning and statistical techniques for predicting vulnerabilities using traceable patterns and metrics as features. We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics.
- Research Article
36
- 10.1007/s10489-021-02324-3
- Mar 31, 2021
- Applied Intelligence
The detection of software vulnerabilities is considered a vital problem in the software security area for a long time. Nowadays, it is challenging to manage software security due to its increased complexity and diversity. So, vulnerability detection applications play a significant part in software development and maintenance. The ability of the forecasting techniques in vulnerability detection is still weak. Thus, one of the efficient defining features methods that have been used to determine the software vulnerabilities is the metaheuristic optimization methods. This paper proposes a novel software vulnerability prediction model based on using a deep learning method and SYMbiotic Genetic algorithm. We are first to apply Diploid Genetic algorithms with deep learning networks on software vulnerability prediction to the best of our knowledge. In this proposed method, a deep SYMbiotic-based genetic algorithm model (DNN-SYMbiotic GAs) is used by learning the phenotyping of dominant-features for software vulnerability prediction problems. The proposed method aimed at increasing the detection abilities of vulnerability patterns with vulnerable components in the software. Comprehensive experiments are conducted on several benchmark datasets; these datasets are taken from Drupal, Moodle, and PHPMyAdmin projects. The obtained results revealed that the proposed method (DNN-SYMbiotic GAs) enhanced vulnerability prediction, which reflects improving software quality prediction.
- Research Article
49
- 10.1109/tr.2016.2630503
- Mar 1, 2017
- IEEE Transactions on Reliability
Statistical prediction models can be an effective technique to identify vulnerable components in large software projects. Two aspects of vulnerability prediction models have a profound impact on their performance: 1) the features (i.e., the characteristics of the software) that are used as predictors and 2) the way those features are used in the setup of the statistical learning machinery. In a previous work, we compared models based on two different types of features: software metrics and term frequencies (text mining features). In this paper, we broaden the set of models we compare by investigating an array of techniques for the manipulation of said features. These techniques fall under the umbrella of dimensionality reduction and have the potential to improve the ability of a prediction model to localize vulnerabilities. We explore the role of dimensionality reduction through a series of cross-validation and cross-project prediction experiments. Our results show that in the case of software metrics, a dimensionality reduction technique based on confirmatory factor analysis provided an advantage when performing cross-project prediction, yielding the best F -measure for the predictions in five out of six cases. In the case of text mining, feature selection can make the prediction computationally faster, but no dimensionality reduction technique provided any other notable advantage.
- Book Chapter
18
- 10.1007/978-3-319-67274-8_6
- Jan 1, 2017
Detecting vulnerable components of a web application is an important activity to allocate verification resources effectively. Most of the studies proposed several vulnerability prediction models based on private and public datasets so far. In this study, we aimed to design and implement a software vulnerability prediction web service which will be hosted on Azure cloud computing platform. We investigated several machine learning techniques which exist in Azure Machine Learning Studio environment and observed that the best overall performance on three datasets is achieved when Multi-Layer Perceptron method is applied. Software metrics values are received from a web form and sent to the vulnerability prediction web service. Later, prediction result is computed and shown on the web form to notify the testing expert. Training models were built on datasets which include vulnerability data from Drupal, Moodle, and PHPMyAdmin projects. Experimental results showed that Artificial Neural Networks is a good alternative to build a vulnerability prediction model and building a web service for vulnerability prediction purpose is a good approach for complex systems.
- Book Chapter
2
- 10.1007/978-981-19-1018-0_46
- Jan 1, 2022
Software vulnerability provides the gateway for attackers to breach the confidentiality, integrity, and availability of information systems. Therefore, the prediction of software vulnerability is an important concern for developing secure software. Software vulnerability prediction (SVP) models predict whether a software component is vulnerable or not. Studies have shown that the model’s efficiency is dependent on its hyperparameter settings. In order to build an effective model, their hyperparameters need to be optimized. In our study, we intend to find the impact of hyperparameter optimization on the performance of SVP models. We performed the experiments using python hyperparameter optimization framework ‘Optuna’ to find the best hyperparameters for eight machine learning algorithms on three public datasets Drupal, Moodle and PHPMyAdmin. We found the p-values to be less than 0.05 in 19 out of 24 cases. Hence, hyperparameter optimization is 79.17% effective in increasing the efficacy of SVP models in our study.KeywordsSoftware vulnerabilityHyperparameter optimizationMachine learning algorithms
- Research Article
- 10.3233/jifs-189940
- Jan 1, 2021
- Journal of Intelligent & Fuzzy Systems
The advent of the era of artificial intelligence makes it possible for administrative subjects to use intelligent machines and systems to engage in administrative activities. Among them, the administrative discretion, which is the core of administrative law, is particularly concerned about the use of artificial intelligence. In the era of weak artificial intelligence, intelligent administrative discretion has been widely used in all aspects of administrative law enforcement, but there is a phenomenon that administrative subjects are negligent in exercising discretion. Looking forward to the era of strong artificial intelligence, artificial intelligence machines or systems may have the ability and power to independently exercise administrative discretion, but they cannot become the real administrative discretion subject. Intelligent administrative discretion is conducive to administrative efficiency and guarantees the fairness of administrative behavior, but it also faces legal risks such as unfair results of discretion, opaque algorithm settings, and weakening of government functions. Only by strengthening the legal basis, protecting the rights of the counterparty, improving the accuracy of the algorithm, and improving the status of the administrative subject can the administrative discretionary behavior under the background of artificial intelligence be effectively regulated.
- Conference Article
2
- 10.1109/icaie53562.2021.00112
- Jun 1, 2021
With the help of professional course teaching to carry out ideological and political education has become a hot topic in the new era. In the era of artificial intelligence, the application of new media and new technology can make the traditional advantages of Ideological and political education highly integrated with information technology, enhance the sense of the times and attraction of Ideological and political education, innovate teaching mode, promote the seamless docking and deep integration of artificial intelligence technology and ideological and political education, and promote ideological and political education into a new era of intelligence. This paper takes “Introduction to international business” as an example, discusses the teaching reform of Ideological and political education in the era of artificial intelligence, and discusses the elements of Ideological and political education. By improving the quality and ability of teachers’ ethics, adopting intelligent education methods and establishing diversified evaluation system of artificial intelligence, the course of introduction to international business is deeply integrated with ideological and political education.
- Research Article
1
- 10.37190/e-inf230102
- Jan 1, 2023
- e-Informatica Software Engineering Journal
Background: Prediction of software vulnerabilities is a major concern in the field of software security. Many researchers have worked to construct various software vulnerability prediction (SVP) models. The emerging machine learning domain aids in building effective SVP models. The employment of data balancing/resampling techniques and optimal hyperparameters can upgrade their performance. Previous research studies have shown the impact of hyperparameter optimization (HPO) on machine learning algorithms and data balancing techniques. Aim: The current study aims to analyze the impact of dual hyperparameter optimization on metrics-based SVP models. Method: This paper has proposed the methodology using the python framework Optuna that optimizes the hyperparameters for both machine learners and data balancing techniques. For the experimentation purpose, we have compared six combinations of five machine learners and five resampling techniques considering default parameters and optimized hyperparameters. Results: Additionally, the Wilcoxon signed-rank test with the Bonferroni correction method was implied, and observed that dual HPO performs better than HPO on learners and HPO on data balancers. Furthermore, the paper has assessed the impact of data complexity measures and concludes that HPO does not improve the performance of those datasets that exhibit high overlap. Conclusion: The experimental analysis unveils that dual HPO is 64% effective in enhancing the productivity of SVP models.
- Conference Article
10
- 10.1145/3293882.3338985
- Jul 10, 2019
Vulnerability Prediction Models (VPMs) aims to identify vulnerable and non-vulnerable components in large software systems. Consequently, VPMs presents three major drawbacks (i) finding an effective method to identify a representative set of features from which to construct an effective model. (ii) the way the features are utilized in the machine learning setup (iii) making an implicit assumption that parameter optimization would not change the outcome of VPMs. To address these limitations, we investigate the significant effect of the Bellwether analysis on VPMs. Specifically, we first develop a Bellwether algorithm to identify and select an exemplary subset of data to be considered as the Bellwether to yield improved prediction accuracy against the growing portfolio benchmark. Next, we build a machine learning approach with different parameter settings to show the improvement of performance of VPMs. The prediction results of the suggested models were assessed in terms of precision, recall, F-measure, and other statistical measures. The preliminary result shows the Bellwether approach outperforms the benchmark technique across the applications studied with F-measure values ranging from 51.1%-98.5%.
- Research Article
1
- 10.1080/1206212x.2025.2452849
- Feb 1, 2025
- International Journal of Computers and Applications
The security of software has always been a vital concern for software developers. Infringements on software systems can result in significant losses in terms of time and confidential data. Software vulnerabilities are considered to be the gateway for attackers to harm information systems. Hence, it is crucial to build effective software vulnerability prediction models. Machine learning algorithms produce high-performance prediction models but the performance of these models is affected by the hyperparameter settings of machine learning methods and imbalanced datasets. The current study aims to find out the effect of single- and multi-objective hyperparameter optimization on software vulnerability prediction models. The paper has proposed an experimental methodology that considers both optimizations and uses eight machine learning methods on PHP open-source public datasets (Drupal, Moodle, and PHPMyAdmin). The experimental results show that after applying multi-objective hyperparameter optimization, the highest AUC achieved is 0.9774 and the F1-Score attained is 0.9222, which is better than the benchmark studies. Random Forest has performed satisfactorily for all three datasets. The comparative analysis of single and multi-objective HPO is performed using post hoc Tukey’s HSD test. Furthermore, the effectiveness of the resampling technique ‘SMOTE’ is observed.
- Conference Article
11
- 10.1109/apsec51365.2020.00011
- Dec 1, 2020
Context: Security is vital to software developed for commercial or personal use. Although more organizations are realizing the importance of applying secure coding practices, in many of them, security concerns are not known or addressed until a security failure occurs. The root cause of security failures is vulnerable code. While metrics have been used to predict software vulnerabilities, we explore the relationship between code and architectural smells with security weaknesses. As smells are surface indicators of a deeper problem in software, determining the relationship between smells and software vulnerabilities can play a significant role in vulnerability prediction models. Objective: This study explores the relationship between smells and software vulnerabilities to identify the smells. Method: We extracted the class, method, file, and package level smells for three systems: Apache Tomcat, Apache CXF, and Android. We then compared their occurrences in the vulnerable classes which were reported to contain vulnerable code and in the neutral classes (non-vulnerable classes where no vulnerability had yet been reported). Results: We found that a vulnerable class is more likely to have certain smells compared to a non-vulnerable class. God Class, Complex Class, Large Class, Data Class, Feature Envy, Brain Class have a statistically significant relationship with software vulnerabilities. We found no significant relationship between architectural smells and software vulnerabilities. Conclusion: We can conclude that for all the systems examined, there is a statistically significant correlation between software vulnerabilities and some smells.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.