With the proliferation of web applications, cross-site scripting (XSS) attacks have increased significantly and now pose a significant threat to users' information security and privacy. To enhance the efficiency of XSS attack detection, the adoption of machine learning (ML) and deep learning (DL) techniques offers promising solutions, but their effectiveness is limited by the lack of comprehensive and diverse datasets. Moreover, existing approaches often prioritize detection accuracy over real-time processing capabilities, which are essential for effective defense. To address these challenges, in this paper, we propose a novel framework that automatically collects web resources, efficiently extracts informative features, and constructs an up-to-date XSS attack dataset, which is then used to train a machine learning-based XSS detection model. Using this framework, we created and published a well-structured dataset over 100,000 samples for the research community. Furthermore, we present a hybrid detection model that leverages the strengths of both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. Extensive evaluations of our dataset demonstrate that the proposed model outperforms other baseline ML models across various metrics, including processing rate. Notably, our model achieves an accuracy of 99.27% while maintaining a low false positive rate of 0.06% and high processing rate of exceeding 1000 samples per second. These results highlight its high accuracy and robustness in detecting XSS, and suitability for real-time applications. Our work presents a comprehensive solution for enhancing web application security by providing a diverse dataset and a high-accuracy detection model with low latency.
Read full abstract