Abstract

Portable Document Format (PDF) is a file format created to create portable and printable documents across platforms. PDF files are one of the most widely used application types in computer-based systems. Thanks to the functionality that PDF files provide, they are used by many users around the world. Malware developers can exploit PDF files due to various factors. Malware can integrate embedded files, JavaScript, PDF files, etc. As a result, PDFs are susceptible to security vulnerabilities in computer-based systems. In this study, we utilised the CIC-Evasive-PDFMal2022 dataset, made accessible by the Canadian Cybersecurity Institute in 2022, that includes two categories, namely benign and malicious. In the preprocessing step, the proposed model transformed text-based PDF parameter data into the 2D PDF417 barcode. 2D Convolutional Neural Network (CNN) models (MobileNetV2, ResNet18, and ShuffleNet) are trained using the dataset generated by the preprocessing step. CNN is a type of artificial neural network used in image recognition, processing, and classification. Type/class based feature sets were then obtained by each CNN model. In the last step, the metaheuristic optimization method (Honey Badger Algorithm) was used. Thanks to this method, the best performing feature set was determined among the feature sets of the types extracted from each CNN model. It was then classified by the softmax method, and an overall accuracy of 99.73% was achieved. The proposed approach has successfully trained 1D data with 2D CNNs. In addition, with the barcode imaging technique, direct understanding of the data by the users is prevented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call