Abstract

With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated from source code or regex patterns based on expert experience. In this paper, TAP, which is based on token mechanism and deep learning technology, was proposed as an analysis model to discover the vulnerabilities of PHP: Hypertext Preprocessor (PHP) Web programs conveniently and easily. Based on the token mechanism of PHP language, a custom tokenizer was designed, and it unifies tokens, supports some features of PHP and optimizes the parsing. Besides, the tokenizer also implements parameter iteration to achieve data flow analysis. On the Software Assurance Reference Dataset(SARD) and SQLI-LABS dataset, we trained the deep learning model of TAP by combining the word2vec model with Long Short-Term Memory (LSTM) network algorithm. According to the experiment on the dataset of CWE-89, TAP not only achieves the 0.9941 Area Under the Curve(AUC), which is better than other models, but also achieves the highest accuracy: 0.9787. Further, compared with RIPS, TAP shows much better in multiclass classification with 0.8319 Kappa and 0.0840 hamming distance.

Highlights

  • At present, the Internet plays an important role in politics, economy, culture and social life

  • According to an F5 Labs research on 433 major malicious attack incidents spanning 12 years, Web applications are the origin of 53% of malicious attacks [4]

  • The custom tokenizer of TAP was compared with the inbuilt function of PHP: Hypertext Preprocessor (PHP) on the Common Weakness Enumeration (CWE)-89 dataset to prove the effectiveness of the custom tokenizer

Read more

Summary

Introduction

The Internet plays an important role in politics, economy, culture and social life. There are various security issues in different emerging Internet environments, such as Internet applications [1], cloud computing [2], crowdsourcing [3] and so on. With the rapid growth of open-source website applications, cyber threats are emerging. According to an F5 Labs research on 433 major malicious attack incidents spanning 12 years, Web applications are the origin of 53% of malicious attacks [4]. In order to resist intruders effectively, the security of the Web applications should be ensured at first. There are 55.2% of Alexa top 10 million websites built on the content management system (CMS). Drupal are the top three most popular CMSs, and their WordPress, Joomla! and Drupal are the top three most popular CMSs, and their

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call