Abstract

AbstractThe number of malicious software applications, or malware programs, increases every year. Their development becomes more sophisticated as new techniques are used to bypass program scanning software applications, such as antiviruses. Thereby, deep learning‐based methods emerge as a new promising way to identify these threats. Our main purpose and contribution in this work is proposing and implementing a successful approach to tackle both binary and multiclass malware classification problems. We used unsupervised word embedding algorithms for representing software applications to be analyzed and long‐short term memory for classifying the software applications. For evaluating our pipeline, we introduce a new dataset for binary and multiclass malware classification because we could not find large datasets containing sufficient samples of cleanware and the various malware types for multiclass classification that could be used to evaluate classification models. Our experimental results reached an accuracy of 88.94% for binary classification and 75.13% for multiclass classification. These results suggest that the proposed dataset is challenging, and using it can help in the training of better malware classifiers, improving security.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call