Abstract

WebShell is a common network backdoor attack that is characterized by high concealment and great harm. However, conventional WebShell detection methods can no longer cope with complex and flexible variations of WebShell attacks. Therefore, this paper proposes a deep super learner for attack detection. First, the collected data are deduplicated to prevent the influence of duplicate data on the result. Second, to detect the results of the algorithm, static and dynamic feature are taken as the feature of the algorithm to construct a comprehensive feature set. We then use the Word2Vec algorithm to vectorize the features. During this period, to prevent the outbreak of the number of features, we use a genetic algorithm to extract the validity of the feature dimension. Finally, we use a deep super learner to detect WebShell. The experimental results show that this algorithm can effectively detect WebShell, and its accuracy and recall are greatly improved.

Highlights

  • With the development of Internet technology, web-based applications have been assimilated into all aspects of our lives

  • In Research on Webshell Detection Method Based on Machine Learning [5], it is proposed to use opcode to extract the dynamic features of Webshell files, and use TF-IDF

  • Using SMOTE effectively solves the misjudgment result caused by the imbalance of the datasets; Using a genetic algorithm effectively solves irrelevant or redundant features; Using the deep super learner effectively solves the limitations of a single algorithm, so that the algorithm can achieve the best expected results

Read more

Summary

Introduction

With the development of Internet technology, web-based applications have been assimilated into all aspects of our lives. As the number of backdoors implanted in websites increases year by year, the issue of how to detect backdoors in websites is critical for data security. Malicious WebShell files can function as website backdoors, so the detection of WebShell files on websites is very important. Webshell attack can be divided into two categories, “large Trojan” file for attack and “micro Trojan”. “micro Trojan” file code is small, usually a few lines to dozens of lines, its main function is used to assist“ large Trojan”file upload, execution script command. Compared with “micro Trojan” file size is much larger, “large Trojan” file size even more than 1 MB, its functions are complex, including the execution of command line procedures, database operations, etc. Symmetry 2020, 12, 1406 complete its function can cooperate with other offensive files to operate jointly, to achieve the purpose of attack

Related Work
Opcode
Static Character of the String Length Variance
Static Character of the Index of Coincidence
Static Character of Information Entropy
Static Character of the File Compression Ratio
Static Character of Eigencode Matching
Feature Vectorization
Feature Selection
Data Sampling Based on the Smote Algorithm
Deep Super Learner
Research Features
Experimental Data
Evaluation Standard
Comparison with Other Algorithms
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.