PDF Malware Detection Based on Optimizable Decision Trees

Qasem Abu Al-Haija,Hazem Qattous,Ammar Odeh

doi:10.3390/electronics11193142

Qasem Abu Al-Haija, Hazem Qattous + Show 1 more

Open Access

https://doi.org/10.3390/electronics11193142

Copy DOI

Journal: Electronics	Publication Date: Sep 30, 2022
Citations: 22	License type: CC BY 4.0

Affiliation: Princess Sumaya University for Technology

Abstract

Portable document format (PDF) files are one of the most universally used file types. This has incentivized hackers to develop methods to use these normally innocent PDF files to create security threats via infection vector PDF files. This is usually realized by hiding embedded malicious code in the victims’ PDF documents to infect their machines. This, of course, results in PDF malware and requires techniques to identify benign files from malicious files. Research studies indicated that machine learning methods provide efficient detection techniques against such malware. In this paper, we present a new detection system that can analyze PDF documents in order to identify benign PDF files from malware PDF files. The proposed system makes use of the AdaBoost decision tree with optimal hyperparameters, which is trained and evaluated on a modern inclusive dataset, viz. Evasive-PDFMal2022. The investigational assessment demonstrates a lightweight and accurate PDF detection system, achieving a 98.84% prediction accuracy with a short prediction interval of 2.174 μSec. To this end, the proposed model outperforms other state-of-the-art models in the same study area. Hence, the proposed system can be effectively utilized to uncover PDF malware at a high detection performance and low detection overhead.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PDF Malware Detection Based on Optimizable Decision Trees

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

MSL: Mining published scientific literature for the extraction and classification of text and images to support IR capabilities
Ahmed Zeeshan ... Zeeshan Saman
Frontiers in Neuroinformatics | VOL. 10
Ahmed Zeeshan, et. al.Ahmed Zeeshan ... Zeeshan Saman
01 Jan 2015
Frontiers in Neuroinformatics | VOL. 10

Creating a more productive, clutter-free, paperless office: a primer on scanning, storage and searching of PDF documents on personal computers
L Citrome
International Journal of Clinical Practice | VOL. 62
L CitromeL Citrome
01 Feb 2008
International Journal of Clinical Practice | VOL. 62

Automatic Language Identification and Content Separation from Indian Multilingual Documents Using Unicode Transformation Format
Rajnish M Rakholia ... Jatinderkumar R Saini
-
Rajnish M Rakholia, et. al.Rajnish M Rakholia ... Jatinderkumar R Saini
24 Aug 2016
24 Aug 2016

Ensemble Classification System for Scientific Chart Recognition from PDF Files
S Nagarajan ... V Karthikeyani
International Journal of Computer Vision and Image Processing | VOL. 2
S Nagarajan, et. al.S Nagarajan ... V Karthikeyani
01 Oct 2012
International Journal of Computer Vision and Image Processing | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PDF Malware Detection Based on Optimizable Decision Trees

Abstract

Talk to us

Similar Papers

More From: Electronics