Abstract
Dynamic analysis and pattern matching techniques are widely used in industry, and they provide a straightforward method for the identification of malware samples. Yara is a pattern matching technique that can use sandbox memory dumps for the identification of malware families. However, pattern matching techniques fail silently due to minor code variations, leading to unidentified malware samples. This paper presents a two-layered Malware Variant Identification using Incremental Clustering (MVIIC) process and proposes clustering of unidentified malware samples to enable the identification of malware variants and new malware families. The novel incremental clustering algorithm is used in the identification of new malware variants from the unidentified malware samples. This research shows that clustering can provide a higher level of performance than Yara rules, and that clustering is resistant to small changes introduced by malware variants. This paper proposes a hybrid approach, using Yara scanning to eliminate known malware, followed by clustering, acting in concert, to allow the identification of new malware variants. F1 score and V-Measure clustering metrics are used to evaluate our results.
Highlights
This paper provides a technique called Malware Variant Identification, using Incremental Clustering (MVIIC)
While the clustering of dynamic analysis features has previously been used for malware detection [4,5], this paper proposes a hybrid scheme using Yara rules, and a novel incremental clustering algorithm [6] to enable the identification of new malware families and malware variants
This paper proposes a two-layered MVIIC technique that makes use of a novel incremental clustering algorithm to support Yara-based malware family identification by the clustering of unidentified malware variants
Summary
This paper provides a technique called Malware Variant Identification, using Incremental Clustering (MVIIC). A sandbox is an instrumented virtual machine that executes malware samples and other programs, and gathers dynamic analysis features resulting from program execution. These features include filesystem activity, registry activity, network traffic, program execution, and code injection. Yara rules may fail to identify new malware variants when software development modifies code corresponding to the Yara regular expressions, or when program strings are changed. These unidentified malware samples provide a valuable source of unknown malware families and new malware variants
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.