Abstract

Metamorphic malware poses a significant threat to conventional signature-based malware detection since its signature is mutable. Multiple copies can be created from metamorphic malware. As such, signature- based malware detection is impractical and ineffective. Thus, research in recent years has focused on applying machine learning-based approaches to malware detection. Profile Hidden Markov Model is a probabilistic model that uses multiple sequence alignments and a position-based scoring system. An enhanced Profile Hidden Markov Model was constructed with the following modifications: n-gram analysis to determine the best length of n-gram for the dataset, setting frequency threshold to determine which n-gram opcodes will be included in the malware detection, and adding consensus sequences to multiple sequence alignments. 1000 malware executables files and 40 benign executable files were utilized in the study. Results show that n-gram analysis and adding consensus sequence help increase malware detection accuracy. Moreover, setting the frequency threshold based on the average TF-IDF of n-gram opcodes gives the best accuracy in most malware families than just by getting the top 36 most occurring n-grams, as done in previous studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call