Abstract

Structured Query Language injection (SQLi) and Cross-Site Scripting (XSS) are the most renowned kinds of input validation vulnerabilities. Of late, vulnerability prediction models based on machine learning have been gaining acceptance in the domain of Web security. Such models offer an easy and effective way of dealing with web application security concerns. However, most of them, in particular, rely on complex graphs generated from source code or regex patterns based on expert knowledge. This paper proposed a method for extracting features from source code and predicting input validation vulnerabilities using machine learning algorithms. The proposed method can extract all features related to the flow of vulnerabilities among the programs and remove the features that are irrelevant to the vulnerability flow. In addition, each vulnerability’s context has been assigned, providing additional data for our model to use in learning about the vulnerability context. Compared to other related methods, the feature extraction method proposed in this paper has been found to have high reusability and better performance. The best model related to the LSTM classifier had a 98.1% recall rate, a 97.9% precision, an accuracy of 98.67%, and a 99.03% area under the curve (AUC) in the test dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call