Abstract

Malicious software, i.e., malware, has been a persistent threat in the information security landscape since the early days of personal computing. The recent targeted attacks extensively use non-executable malware as a stealthy attack vector. There exists a substantial body of previous work on the detection of non-executable malware, including static, dynamic, and combined methods. While static methods perform orders of magnitude faster, their applicability has been hitherto limited to specific file formats. This paper introduces Hidost, the first static machine-learning-based malware detection system designed to operate on multiple file formats. Extending a previously published, highly effective method, it combines the logical structure of files with their content for even better detection accuracy. Our system has been implemented and evaluated on two formats, PDF and SWF (Flash). Thanks to its modular design and general feature set, it is extensible to other formats whose logical structure is organized as a hierarchy. Evaluated in realistic experiments on timestamped datasets comprising 440,000 PDF and 40,000 SWF files collected during several months, Hidost outperformed all antivirus engines deployed by the website VirusTotal to detect the highest number of malicious PDF files and ranked among the best on SWF malware.

Highlights

  • One of the most effective tools for breaking into computer systems remains malicious software, i.e., malware

  • 4.3 Experimental results Experimental results of different methods operating on Portable Document Format (PDF) and SWF data are illustrated in Figs. 15 and 16, respectively

  • The methods are compared in four performance indicators typical for classification tasks: true (TPR) and false positive rate (FPR), accuracy, and area under receiver operating characteristic (AUROC)

Read more

Summary

Introduction

One of the most effective tools for breaking into computer systems remains malicious software, i.e., malware. While being a well-known plague since the dawn of personal computing, malware has developed several insidious traits in the recent decade to serve the needs of criminal business. One of them is the infection of files in well-known formats used to exchange documents between businesses and individuals. Such infection offers the following benefits to attackers: 1. 2. A steady stream of new vulnerabilities has been observed in the recent years in document viewers due to their high complexity caused, in turn, by the complexity of document formats. 3. Flexibility and versatility of document formats offer ample opportunities for obfuscation of embedded malicious content

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call