Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

MalHunter: Automatic generation of multiple behavioral signatures for polymorphic malware detection

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Malicious software, also called malware, is one of the major threats on the Internet today. Despite various antivirus programs, thousands of Internet hosts are daily infected with malware, such as viruses, worms, and Trojan horses. Due to using a variety of obfuscation techniques, polymorphic malware can easily evade signature-based detection techniques by continually changing their appearance or patterns. However, all polymorphic malware samples in the same malware family often follow the same behavioral pattern that can be used to generate a behavioral signature. In this paper, we propose MalHunter, a novel method based on sequence clustering and sequence alignment to automatic generation of behavioral signatures for polymorphic malware detection. We first generate a set of behavioral sequences for different samples of a polymorphic malware, each of which represents a thread's behavior. We then group similar behavioral sequences into the same cluster and generate an alignment pattern for each cluster. We finally build a multiple behavioral signature for the polymorphic malware. MalHunter stores fewer signatures in the signature database due to the generation of a multiple behavioral signature for different samples of each polymorphic malware. The experimental results on a malware collection suggest that MalHunter is both precise and succinct for effective matching and detection of polymorphic malware.

Similar Papers
  • Conference Article
  • Cite Count Icon 12
  • 10.1109/icdm.2011.104
Modeling High-Level Behavior Patterns for Precise Similarity Analysis of Software
  • Dec 1, 2011
  • Taeho Kwon + 1 more

The analysis of software similarity has many applications such as detecting code clones, software plagiarism, code theft, and polymorphic malware. Because often source code is unavailable and code obfuscation is used to avoid detection, there has been much research on developing effective models to capture runtime behavior to aid detection. Existing models focus on low-level information such as dependency or purely occurrence of function calls, and suffer from poor precision, poor scalability, or both. To overcome limitations of existing models, this paper introduces a precise and succinct behavior representation that characterizes high-level object-accessing patterns as regular expressions. We first distill a set of high-level patterns (the alphabet S of the regular language) based on two pieces of information: function call patterns to access objects and type state information of the objects. Then we abstract a runtime trace of a program P into a regular expression e over the pattern alphabet S to produce P's behavior signature. We show that software instances derived from the same code exhibit similar behavior signatures and develop effective algorithms to cluster and match behavior signatures. To evaluate the effectiveness of our behavior model, we have applied it to the similarity analysis of polymorphic malware. Our results on a large malware collection demonstrate that our model is both precise and succinct for effective and scalable matching and detection of polymorphic malware.

  • Conference Article
  • Cite Count Icon 21
  • 10.1109/isssc50941.2020.9358835
Malware Detection & Classification using Machine Learning
  • Dec 16, 2020
  • 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)
  • Sanket Agarkar + 1 more

In today's internet world, malware is still the most harmful threat to the internet users. The new malware developed are distinct from conventional one, more dynamic in design and usually inherits the properties from two or more malware types, these type of malware are called polymorphic. Polymorphic malware is a form of malware which constantly modifies its recognisable features to fool detection using traditional signature-based models. Behavior-based identification of ransomware tests not just the file's identity, but also the operation it intends to take after some time span or at specific time. Now everyone wanted to get the behavioural pattern that can be derived from static analysis or dynamic analysis, with these pattern various machine learning models can be used to predict whether it is a malware or not, or identify its family of malware. In this work, behavior-based detection methods are address and how these various machine learning techniques are used to develop behavior-based malware detection and classification methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 59
  • 10.1186/s13635-017-0055-6
Polymorphic malware detection using sequence classification methods and ensembles
  • Jan 23, 2017
  • EURASIP Journal on Information Security
  • Jake Drew + 2 more

Identifying malicious software executables is made difficult by the constant adaptations introduced by miscreants in order to evade detection by antivirus software. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence classification have been developed by the bioinformatics and computational biology communities. In this paper, we apply methods designed for gene sequencing to detect malware in a manner robust to attacker adaptations. Whereas most gene classification tools are optimized for and restricted to an alphabet of four letters (nucleic acids), we have selected the Strand gene sequence classifier for malware classification. Strand’s design can easily accommodate unstructured data with any alphabet, including source code or compiled machine code. To demonstrate that gene sequence classification tools are suitable for classifying malware, we apply Strand to approximately 500 GB of malware data provided by the Kaggle Microsoft Malware Classification Challenge (BIG 2015) used for predicting nine classes of polymorphic malware. Experiments show that, with minimal adaptation, the method achieves accuracy levels well above 95% requiring only a fraction of the training times used by the winning team’s method.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/comsnets53615.2022.9668396
Polymorphic Malware Behavior Through Network Trace Analysis
  • Jan 4, 2022
  • Xiyue Deng + 1 more

Malware continues to be a major threat to information security. To avoid being detected and analyzed, modern malware is continuously improving its stealthiness, including code obfuscation and encryption. On the other hand, a high number of unique malware samples detected daily suggests a likely high degree of code reuse under the layers of stealth. We observe that although obfuscation greatly changes a malware's binary, its functionalities remain intact. We propose to leverage malware's network behavior during its execution, to understand the malware's functionality and detect related or even same (polymorphic) malware. While malware may transform its code to evade analysis, we contend that its key network behaviors must endure through the transformations to achieve the malware's ultimate purpose, such as sending victim information, scanning for vulnerable hosts, etc. We propose an encoding of malware samples that can help us classify samples, identify code reuse and genealogy, and develop behavioral signatures for malware defense based on malware's network behavior. We leverage the same encoding to identify polymorphic malware in a random dataset containing more than 8,000 diverse samples from the Georgia Tech Apiary project. We cluster 6,595 samples which show some network activity based on our embedding features and more than 90% of the cluster contains potentially polymorphic malware with up to 80 % of the clusters identify truly polymorphic malware samples, i.e., they have identical network behavior as at least one other sample in our dataset. Such high level of polymorphism indicates a high level of code reuse, and shows how our approach can complement traditional code analysis techniques for malware defense.

  • Conference Article
  • Cite Count Icon 5
  • 10.1063/5.0104235
Metamorphic and polymorphic malware detection and classification using dynamic analysis of API calls
  • Jan 1, 2022
  • AIP conference proceedings
  • Vivekanand Kuriyal + 3 more

Malicious programs have created a major threat in the area of cyber security. Malware detection and classification is a big challenge for the researchers. Now days Machine Learning techniques using Dynamic analysis of a malicious file play an important role for malware detection. Some new type of malware as polymorphic and metamorphic cannot detected easily. Their tactic hide them from anti malware system, such type of malware creates new instance and encrypting the malicious payload as well as changing the code structure at each infection, while retaining the same functionality. To address this we purpose a model for Polymorphic and metamorphic malware detection. This paper addresses detection and classification problem by providing a deeper analysis of API calls, key features and their parameters that enable polymorphism in malware. We named this model as MPDC, This paper also proposed a Feature Engineering approach for the better classification of malware family, this research is based on behavioral (Dynamic) features analysis and API. We used 8 type of malware family for classification. Our model achieved a Detection accuracy rate of 98.74%, and malware family classification accuracy rate of 96%. This research will revolutionize anti-malware industry in creating better protection mechanisms.

  • Conference Article
  • Cite Count Icon 48
  • 10.1109/lcn.2009.5355037
Measuring similarity of malware behavior
  • Oct 1, 2009
  • Martin Apel + 2 more

Malicious software (malware) represents a major threat for computer systems of almost all types. In the past few years the number of prevalent malware samples has increased dramatically due to the fact that malware authors started to deploy morphing (aka obfuscation) techniques in order to hinder detection of such polymorphic malware by anti-malware products. Using these techniques numerous variants of a malware can be generated. All these variants have a different syntactic representation while providing almost the same functionality and showing similar behavior. In order to effectively detect polymorphic malware it is advantageous (if not required) to know which malware samples are variants of a particular malware. Respective approaches for determining this relation between malware samples automatically are currently investigated by a number of researchers. A prerequisite for assessing this relation based on particular features of malware samples is an appropriate similarity or distance measure. In particular a number of approaches for clustering malware samples have been recently published. Thereby different similarity measures are used but without thoroughly discussing their choice. So it is an unanswered question which similarity measures are appropriate for determining respective relations between malware samples. To answer this question we study different distance measures in detail and discuss desirable properties of a distance measure for this particular purpose. We focus on behavioral features of malware and compare and experimentally evaluate different distance measures for malware behavior. Based on our results we identify a most appropriate distance measure for grouping malware samples based on similar behavior.

  • Conference Article
  • Cite Count Icon 70
  • 10.1109/spw.2016.30
Polymorphic Malware Detection Using Sequence Classification Methods
  • May 1, 2016
  • Jake Drew + 2 more

Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence classification have been developed by the bioinformatics and computational biology communities. In this paper, we argue that these methods can be usefully applied to malware detection. Unfortunately, gene classification tools are usually optimized for and restricted to an alphabet of four letters (nucleic acids). Consequently, we have selected the Strand gene sequence classifier, which offers a robust classification strategy that can easily accommodate unstructured data with any alphabet including source code or compiled machine code. To demonstrate Stand's suitability for classifying malware, we execute it on approximately 500GB of malware data provided by the Kaggle Microsoft Malware Classification Challenge (BIG 2015) used for predicting 9 classes of polymorphic malware. Experiments show that, with minimal adaptation, the method achieves accuracy levels well above 95% requiring only a fraction of the training times used by the winning team's method.

  • Conference Article
  • 10.1109/icauc68182.2026.11441025
Comparative Analysis of Polymorphic Malware Detection Methods in Modern Cybersecurity
  • Jan 19, 2026
  • Darshan P + 1 more

Detection of malware has been made more challenging as new forms of threats keep on changing over time and are no longer limited to the fixed patterns of codes. Polymorphic malware is one of such problems, that is, it changes its internal structure many times, but the malicious intent remains the same. The outcomes of this behavior are that the traditional signature-based detection methods cannot be trusted to offer good protection. To this, researchers have, over the years, come up with various detection strategies each focusing on the problem in a distinct way. Other methods are based on the analysis of program code without running it, whereas others are based on the analysis of behavior at runtime. More current solutions integrate these notions or use learning-based models to enhance the detection ability. Nevertheless, these methods differ in the effectiveness that is highly dependent on the cost of computation, scalability, and deployment limitations. This paper discusses current polymorphic malware detection techniques in terms of their advantages and limitations (as practiced) instead of focusing on their accuracy. The discussion underlines the reason behind the effectiveness of some techniques in the controlled environment and failure in the real-world environment. The study will inform the decision-making process of researchers and practitioners in modern cybersecurity systems by the trade-offs outlined.

  • Research Article
  • Cite Count Icon 5
  • 10.1109/access.2019.2914031
Malware Clustering Using Family Dependency Graph
  • Jan 1, 2019
  • IEEE Access
  • Binlin Cheng + 3 more

Malware brings a major security threat on the Internet today. It is not surprising that much research has concentrated on detecting malware. Unfortunately, the current malware detection approaches suffer from ineffective detection of new malware samples. These models effectively identify the known malware samples but not new variants. To address this issue, we propose a novel malware detection approach based on the family graph. First, we trace the API calls of the monitored application, and then we generate the dependency graph based on the dependency relationship of the API calls. At last, we construct the family dependency graph via clustering the graphs of a known malware family. In this way, we can determine whether a new sample belongs to a known malware family. The evaluation results show that our approach is effective with small overhead compared to other existing approaches.

  • Research Article
  • Cite Count Icon 1
  • 10.58254/viti.5.2024.16.181
Justification of the choice of the approach to the determination of the invariant component in the behavior of polymorphic (metamorphic) malware on the basis of reducing the dimensionality of the sign space
  • Jun 1, 2024
  • Communication, informatization and cybersecurity systems and technologies
  • V Fesokha + 2 more

The evolution of malware use scenarios necessitates the development of effective strategies to neutralise their destructive impact. One of the most threatening types of malware is polymorphic (metamorphic) viruses, as they are largely able to evade detection by intrusion detection systems, information security management (security events), antivirus software and systems for proactive detection of atypical threats and targeted attacks on endpoints due to their ability to change their own signature. In addition, there has been a rapid increase in recent cyber incidents involving the use of polymorphic (metamorphic) malware. The main reason for this growth is the availability of artificial intelligence technologies that allow attackers to modify the code of already classified malware quickly and efficiently, without requiring significant specialised technical competence. A comparative analysis of existing approaches to detecting polymorphic, oligomorphic and metamorphic malware is carried out. It is found that no group of methods uses to its advantage the key feature of polymorphic (metamorphic) malware – invariant behaviour by a certain subset of features that characterise the same vector of destructive impact of malware. With a view to neutralising the property of modification of its own code by polymorphic (metamorphic) malware, the article proposes an approach to determining its invariant component during behavioural analysis based on a combination of the advantages of behavioural analysis and machine learning techniques – reducing the dimensionality of the studied feature space. Such an approach will potentially allow determining the invariant behaviour of malware as a subset of the studied features for each known type of malware, which in turn forms the basis for implementing a new approach to the effective detection of modified (advanced) malware.

  • Book Chapter
  • 10.1007/978-981-99-1767-9_11
Malware Classification in Local System Executable Files Using Deep Learning
  • Jan 1, 2023
  • Pagadala Ganesh Krishna + 2 more

One of the biggest and most severe risks on the Internet today is malicious software, generally known as malware. Attackers are producing malware that has the ability to change its source code as it spreads and is polymorphic and metamorphic. Furthermore, the variety and quantity of their variants seriously compromise the effectiveness of current defences, which frequently rely on signature-based techniques and are unable to identify malicious executables that have not yet been detected. Variants from different malware families have behavioural traits that are indicative of their function and place in society. Utilizing the behavioural patterns obtained either statically or dynamically, deep learning techniques can be utilized to discover and classify novel viruses into their recognized families. In this digital age, security failures brought on by malware attacks are on the rise and pose a serious security concern. Malware detection is still a strongly contested academic topic because of the significant implications that malware attacks have on businesses, governments, and computer users. For the real-time identification of unknown malware, the efficacy of current malware detection techniques, which entail the static and dynamic analysis of malware signatures and behaviour patterns, has not been shown. For classifying malware, we mostly utilize CNN and ELM deep learning algorithms.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/icitcs.2016.7740362
Polymorphic Malware Detection
  • Sep 1, 2016
  • Nur Syuhada Selamat + 2 more

The most regular method of detecting malware relies on signature-based detection. Polymorphic malware pose a serious threat to modern computing. The challenge faced with this type of malware is that there is difficult to Antivirus (AV) technology to detect them. This polymorphic malware can't be detected by AV scanners because of mutated code by itself. This mutated code generated by the polymorphic engine, or called as mutation engine to make this malware become more difficult to read. In this paper, researcher examined how to detect polymorphic malware from the list of samples file based on dropped files.

  • Conference Article
  • Cite Count Icon 2
  • 10.23919/splitech49282.2020.9243841
Smart Malware Detection: From Signatures to Artificial Intelligence
  • Sep 23, 2020
  • Jannatul Ferdaos + 4 more

Living in the digital era has brought us countless benefits while introducing certain risks. Today, hundreds of thousands of new malware appear every day, thus increasing the risk of data being stolen, corrupted or exploited by malicious entities. While signatures are typically used to detect known malware using anti-virus scanners, this approach is unable to detect new malware (i.e. zero day attacks), encrypted malware, or polymorphic malware able to change its identifiable features or behavior to evade detection. In this work, we propose a smart artificial intelligence based malware detection approach that leverages a combination of machine learning models as well as static and dynamic analysis techniques for the real-time detection of new or polymorphic malware. The system design is elaborated and extensive testing results are presented to showcase the capabilities of our proposed solution. The performance of eight machine learning models are compared to identify the optimal model for static and dynamic malware analysis, thus providing insights on the ability to use machine learning for real-time malware detection.

  • Research Article
  • Cite Count Icon 38
  • 10.1111/coin.12551
Attention‐based convolutional neural network deep learning approach for robust malware classification
  • Sep 24, 2022
  • Computational Intelligence
  • Vinayakumar Ravi + 1 more

Recently, transforming windows files into images and its analysis using machine learning and deep learning have been considered as a state‐of‐the art works for malware detection and classification. This is mainly due to the fact that image‐based malware detection and classification is platform independent, and the recent surge of success of deep learning model performance in image classification. Literature survey shows that convolutional neural network (CNN) deep learning methods are successfully employed for image‐based windows malware classification. However, the malwares were embedded in a tiny portion in the overall image representation. Identifying and locating these affected tiny portions is important to achieve a good malware classification accuracy. In this work, a multi‐headed attention based approach is integrated to a CNN to locate and identify the tiny infected regions in the overall image. A detailed investigation and analysis of the proposed method was done on a malware image dataset. The performance of the proposed multi‐headed attention‐based CNN approach was compared with various non‐attention‐CNN‐based approaches on various data splits of training and testing malware image benchmark dataset. In all the data‐splits, the attention‐based CNN method outperformed non‐attention‐based CNN methods while ensuring computational efficiency. Most importantly, most of the methods show consistent performance on all the data splits of training and testing and that illuminates multi‐headed attention with CNN model's generalizability to perform on the diverse datasets. With less number of trainable parameters, the proposed method has achieved an accuracy of 99% to classify the 25 malware families and performed better than the existing non‐attention based methods. The proposed method can be applied on any operating system and it has the capability to detect packed malware, metamorphic malware, obfuscated malware, malware family variants, and polymorphic malware. In addition, the proposed method is malware file agnostic and avoids usual methods such as disassembly, de‐compiling, de‐obfuscation, or execution of the malware binary in a virtual environment in detecting malware and classifying malware into their malware family.

  • Conference Article
  • Cite Count Icon 17
  • 10.1109/dasc.2011.47
Polymorphic Malware Detection Using Hierarchical Hidden Markov Model
  • Dec 1, 2011
  • Fahad Bin Muhaya + 2 more

Binary signatures have been widely used to detect malicious software on the current Internet. However, this approach is unable to achieve the accurate identification of polymorphic malware variants, which can be easily generated by the malware authors using code generation engines. Code generation engines randomly produce varying code sequences but perform the same desired malicious functions. Previous research used flow graph and signature tree to identify polymorphic malware families. The key difficulty of previous research is the generation of precisely defined state machine models from polymorphic variants. This paper proposes a novel approach, using Hierarchical Hidden Markov Model (HHMM), to provide accurate inductive inference of the malware family. This model can capture the features of self-similar and hierarchical structure of polymorphic malware family signature sequences. To demonstrate the effectiveness and efficiency of this approach, we evaluate it with real malware samples. Using more than 15,000 real malware, we find our approach can achieve high true positives, low false positives, and low computational cost.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant