XAI-PDF: A Robust Framework for Malicious PDF Detection Leveraging SHAP-Based Feature Engineering

Mustafa Al-Fayoumi,Christine Amareen,Qasem Abu Al-Haija,Rakan Armoush

doi:10.34028/iajit/21/1/12

Abstract

With the increasing number of malicious PDF files used for cyberattacks, it is essential to develop efficient and accurate classifiers to detect and prevent these threats. Machine Learning (ML) models have successfully detected malicious PDF files. This paper presents XAI-PDF, an efficient system for malicious PDF detection designed to enhance accuracy and minimize decision-making time on a modern dataset, the Evasive-PDFMal2022 dataset. The proposed method optimizes malicious PDF classifier performance by employing feature engineering guided by Shapley Additive Explanations (SHAP). Particularly, the model development approach comprises four phases: data preparation, model building, explainability of the models, and derived features. Utilizing the interpretability of SHAP values, crucial features are identified, and new ones are generated, resulting in an improved classification model that showcases the effectiveness of interpretable AI techniques in enhancing model performance. Various interpretable ML models were implemented, with the Lightweight Gradient Boosting Machine (LGBM) outperforming other classifiers. The Explainable Artificial Intelligence (XAI) global surrogate model generated explanations for LGBM predictions. Experimental comparisons of XAI-PDF with baseline methods revealed its superiority in achieving higher accuracy, precision, and F1-scores with minimal False Positive (FP) and False Negative (FN) rates (99.9%, 100%, 99.89%,0.000, and 0.002, respectively). Additionally, XAI-PDF requires only 1.36 milliseconds per record for predictions, demonstrating increased resilience in detecting evasive malicious PDF files compared to state-of-the-art methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

XAI-PDF: A Robust Framework for Malicious PDF Detection Leveraging SHAP-Based Feature Engineering

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology

Lead the way for us

Journal: The International Arab Journal of Information Technology	Publication Date: Jan 1, 2024
Citations: 1

Similar Papers

Optimisation and interpretation of machine and deep learning models for improved water quality management in Lake Loktak
Swapan Talukdar ... Atiqur Rahman
Journal of Environmental Management | VOL. 351
Swapan Talukdar, et. al.Swapan Talukdar ... Atiqur Rahman
25 Dec 2023
Journal of Environmental Management | VOL. 351

Understanding machine learning predictions of wastewater treatment plant sludge with explainable artificial intelligence.
Fuad Bin Nasir ... Jin Li
Water environment research : a research publication of the Water Environment Federation | VOL. 96
Fuad Bin Nasir, et. al.Fuad Bin Nasir ... Jin Li
25 Sep 2024
Water environment research : a research publication of the Water Environment Federation | VOL. 96

Nuclear Magnetic Resonance Chemical Shift As Highly Explainable Chemical Structure Fingerprints for Anion Exchange Membrane Polymers
Yin Kan Phua ... Koichiro Kato
Electrochemical Society Meeting Abstracts | VOL. MA2023-02
Yin Kan Phua, et. al.Yin Kan Phua ... Koichiro Kato
22 Dec 2023
Electrochemical Society Meeting Abstracts | VOL. MA2023-02

Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence (XAI)
Nida Aslam ... Reham Baageel
Sustainability | VOL. 14
Nida Aslam, et. al.Nida Aslam ... Reham Baageel
16 Jun 2022
Sustainability | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

XAI-PDF: A Robust Framework for Malicious PDF Detection Leveraging SHAP-Based Feature Engineering

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology