Abstract

A webshell is a malicious backdoor that allows remote access and control to a web server by executing arbitrary commands. The wide use of obfuscation and encryption technologies has greatly increased the difficulty of webshell detection. To this end, we propose a novel webshell detection model leveraging the grammatical features extracted from the PHP code. The key idea is to combine the executable data characteristics of the PHP code with static text features for webshell classification. To verify the proposed model, we construct a cleaned data set of webshell consisting of 2,917 samples from 17 webshell collection projects and conduct extensive experiments. We have designed three sets of controlled experiments, the results of which show that the accuracy of the three algorithms has reached more than 99.40%, the highest reached 99.66%, the recall rate has been increased by at least 1.8%, the most increased by 6.75%, and the F1 value has increased by 2.02% on average. It not only confirms the efficiency of the grammatical features in webshell detection but also shows that our system significantly outperforms several state-of-the-art rivals in terms of detection accuracy and recall rate.

Highlights

  • Webshell is a web backdoor written in the web scripting languages that provide a covert way to communicate with the server [1]

  • The results show that the executable data characteristics of PHP code is one of the important grammatical features of PHP webshell, which significantly improves the accuracy of the detection model

  • We propose a webshell detection model based on static features of PHP codes

Read more

Summary

Introduction

Webshell is a web backdoor written in the web scripting languages that provide a covert way to communicate with the server [1]. Dynamic feature detection is mainly based on the webshell file behavior [3], webshell communication traffic [4,5,6], and other characteristics It only works when webshell is dynamically executed. The static feature detection methods are mainly based on webshell text content as well as web log information [9, 10] for analysis and detection. From the perspective of the grammatical features of webshell, we constructed a webshell detection model based on the executable data characteristics of PHP code (2) We construct a cleaned data set to facilitate subsequent related research via collecting 17 existing webshell data sets on Github. The results show that the executable data characteristics of PHP code is one of the important grammatical features of PHP webshell, which significantly improves the accuracy of the detection model.

Related Work
Model Architecture
Experimental Analysis
Evaluation index
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call