Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques

Ryo Ito,Mamoru Mimura

doi:10.1109/asiajcis.2019.00-12

Abstract

Attackers often use an executable file (malware) as a tool to obtain sensitive information from specific companies and individuals. Anti-virus software attempts to detect the malware by pattern matching method etc. However, it is difficult to detect unknown malware in these methods. The unknown malware is detected by a sandbox, etc. We consider another method because the sandbox requires much time for running. ASCII strings extracted from executable files are helpful for analyzing malware. With the recent development of natural language processing (NLP) techniques, it is becoming possible to use these strings as a malware detection method. In this paper, we propose a malware detection method using ASCII strings with NLP techniques. Our method divides these strings into words, and distinguishes the difference of the words between benign and malicious executable files. To compare with the arrangement of words or the frequency of appearing words, uncommon words are unnecessary in NLP techniques. Thus, we consider that reducing the uncommon words improves the detection rate. Our method converts a corpus of frequent words into a feature vector with natural language processing techniques. In our experiments, we used a dataset containing more than 23,000 malware samples (more than 2,100 malware families) provided by FFRI and more than 16,000 benign files collected from download.cnet.com. Our method achieves the F-measure more than 0.85. The experimental results show that our method detects unknown malware with high accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Applying NLP techniques to malware detection in a practical environment
Mamoru Mimura ... Ryo Ito
International Journal of Information Security | VOL. 21
Mamoru Mimura, et. al.Mamoru Mimura ... Ryo Ito
06 Jun 2021
International Journal of Information Security | VOL. 21

A comprehensive investigation of natural language processing techniques and tools to generate automated test cases
Imran Ahsan ... Muhammad Waseem Anwar
-
Imran Ahsan, et. al.Imran Ahsan ... Muhammad Waseem Anwar
22 Mar 2017
22 Mar 2017

Natural Language Processing Utilisation in Healthcare
S Vani ... Palvadi Srinivas Kumar
-
S Vani, et. al.S Vani ... Palvadi Srinivas Kumar
04 Feb 2022
04 Feb 2022

Language Learning Research at the Intersection of Experimental, Computational, and Corpus‐Based Approaches
Patrick Rebuschat ... Detmar Meurers
Language Learning | VOL. 67
Patrick Rebuschat, et. al.Patrick Rebuschat ... Detmar Meurers
01 Jun 2017
Language Learning | VOL. 67

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques

Abstract

Talk to us

Similar Papers