Abstract
Recently, some graph-based methods have been proposed for malware detection. However, current malware is generally characterized by sophisticated behaviors, which makes graph-based malware detection extremely challenging. To address this issue, we propose a graph repartition algorithm by transforming API call graphs into fragment behaviors based on programs’ dynamic execution traces. The proposed algorithm relies on the N-order subgraph (NSG) for constructing the appropriate fragment behavior. Moreover, we improve the term frequency-inverse document frequency- (TF-IDF-) like measure and information gain (IG) to extract the crucial N-order subgraph (CNSG). This novel behavioral representation and improved extraction method can accurately represent crucial behaviors of malware. Experiments on 4,400 samples demonstrate that the proposed method achieves a high accuracy of 99.75% in malware detection and promising performance of 95.27% in malware classification.
Highlights
Malware refers to any software that aims at damaging or infiltrating computer systems [1]. e fast-growing malware variants pose a serious threat to malware detection
We aim to enhance the malware classification performance by constructing an appropriate behavioral representation of the malware family. e main contributions of this method are as follows: (1) We propose a graph repartition algorithm to extract fragment behaviors from original Application Programming Interface (API) call graphs. e extracted fragment behavior is a graph-based API sequence that preserves the dependency of the API call graph
We propose a method that exploits the idea of TF-IDF and information gain (IG) to evaluate the importance of an N-order subgraph (NSG)
Summary
Malware refers to any software that aims at damaging or infiltrating computer systems [1]. e fast-growing malware variants pose a serious threat to malware detection. E fast-growing malware variants pose a serious threat to malware detection. Facing numerous and sophisticated malware variants, malware detection is urgently needed (e.g., see [3,4,5]). Malware detection is mainly divided into static and dynamic analysis methods [6]. Static analysis methods are processes of analyzing instructions and structures to confirm program functions [7]. Ey do not need to run malware directly. Static analysis methods are sensitive to sophisticated obfuscation instructions and encryption techniques. E advantage of dynamic methods is that they observe behaviors by running programs in a virtual environment. Dynamic methods commonly address the threat of statically obfuscated malware [8] and encryption techniques. Dynamic analysis is an effective way to recognize malware behavior. When we want to analyze the keylogger, dynamic analysis can help us find the keylogger’s log file and trace the information
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.