Keeping pace with the creation of new malicious PDF files using an active-learning based detection framework

Nir Nissim,Asaf Shabtai,Yuval Elovici,Robert Moskovitch,Matan Edri,Aviad Cohen,Oren Barad

doi:10.1186/s13388-016-0026-3

Nir Nissim, Asaf Shabtai + Show 5 more

Open Access

https://doi.org/10.1186/s13388-016-0026-3

Copy DOI

Abstract

Attackers increasingly take advantage of naive users who tend to treat non-executable files casually, as if they are benign. Such users often open non-executable files although they can conceal and perform malicious operations. Existing defensive solutions currently used by organizations prevent executable files from entering organizational networks via web browsers or email messages. Therefore, recent advanced persistent threat attacks tend to leverage non-executable files such as portable document format (PDF) documents which are used daily by organizations. Machine Learning (ML) methods have recently been applied to detect malicious PDF files, however these techniques lack an essential element—they cannot be efficiently updated daily. In this study we present an active learning (AL) based framework, specifically designed to efficiently assist anti-virus vendors focus their analytical efforts aimed at acquiring novel malicious content. This focus is accomplished by identifying and acquiring both new PDF files that are most likely malicious and informative benign PDF documents. These files are used for retraining and enhancing the knowledge stores of both the detection model and anti-virus. We propose two AL based methods: exploitation and combination. Our methods are evaluated and compared to existing AL method (SVM-margin) and to random sampling for 10 days, and results indicate that on the last day of the experiment, combination outperformed all of the other methods, enriching the signature repository of the anti-virus with almost seven times more new malicious PDF files, while each day improving the detection model’s capabilities further. At the same time, it dramatically reduces security experts’ efforts by 75 %. Despite this significant reduction, results also indicate that our framework better detects new malicious PDF files than leading anti-virus tools commonly used by organizations for protection against malicious PDF files.

Highlights

Cyber-attacks aimed at organizations have increased since 2009, with 91 % of all organizations hit by cyberattacks in 2013.1 Attacks aimed at organizations usually include harmful activities such as stealing confidential information, spying and monitoring an organization, and1 http://www.humanipo.com/news/37983/91-of-organisations-hit-by-cyber attacks-in-2013/.disrupting an organization’s actions
Before we provide a review of existing techniques and known methods of attack, it is worthwhile to mention that Adobe Reader version X, released in 2011, offers a new feature called Protected Mode Adobe Reader (PMAR)
The number of new malicious portable document format (PDF) files is 128 since the initial detection model was trained on an initial set of 574 labeled PDF files that contained 128 malwares

Summary

Introduction

Cyber-attacks aimed at organizations have increased since 2009, with 91 % of all organizations hit by cyberattacks in 2013.1 Attacks aimed at organizations usually include harmful activities such as stealing confidential information, spying and monitoring an organization, and1 http://www.humanipo.com/news/37983/91-of-organisations-hit-by-cyber attacks-in-2013/.disrupting an organization’s actions. Email has become a very attractive platform from which to initiate cyber-attacks against organizations. Attackers often use social engineering in order to encourage recipients to press a link or open a malicious web page or attachment. Before we provide a review of existing techniques and known methods of attack, it is worthwhile to mention that Adobe Reader version X, released in 2011, offers a new feature called Protected Mode Adobe Reader (PMAR). Protected mode uses a sandbox technique in order to create an isolated environment for the Acrobat Reader rendering agent to run while reading a PDF file. Most organizations are not up-to-date with the newest versions of software, including PDF readers, and they are exposed to many well-known attacks that exploit vulnerabilities that exist in previous versions of Adobe Reader. In order to explain how PDF files can be exploited when created or manipulated by an attacker, we first describe the structure of a viable PDF file

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Security Informatics	Publication Date: Feb 18, 2016
Citations: 32	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Keeping pace with the creation of new malicious PDF files using an active-learning based detection framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security Informatics

Lead the way for us

Similar Papers

MSL: Mining published scientific literature for the extraction and classification of text and images to support IR capabilities
Ahmed Zeeshan ... Zeeshan Saman
Frontiers in Neuroinformatics | VOL. 10
Ahmed Zeeshan, et. al.Ahmed Zeeshan ... Zeeshan Saman
01 Jan 2015
Frontiers in Neuroinformatics | VOL. 10

Creating a more productive, clutter-free, paperless office: a primer on scanning, storage and searching of PDF documents on personal computers
L Citrome
International Journal of Clinical Practice | VOL. 62
L CitromeL Citrome
01 Feb 2008
International Journal of Clinical Practice | VOL. 62

Application of deep reinforcement learning in attacking and protecting structural features-based malicious PDF detector
Tian Jiang ... Xiaohui Cui
Future Generation Computer Systems | VOL. 141
Tian Jiang, et. al.Tian Jiang ... Xiaohui Cui
13 Nov 2022
Future Generation Computer Systems | VOL. 141

PDF Malware Detection Based on Optimizable Decision Trees
Qasem Abu Al-Haija ... Ammar Odeh
Electronics | VOL. 11
Qasem Abu Al-Haija, et. al.Qasem Abu Al-Haija ... Ammar Odeh
30 Sep 2022
Electronics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Keeping pace with the creation of new malicious PDF files using an active-learning based detection framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security Informatics