Abstract

Malicious software poses serious threats to our lives, and the activity to detect malware is becoming more and more important. An effective approach is to train a classifier using known software samples and malware samples, and recognize malware from new software. To do that, a recent popular trend is to use OpCode, which is extracted from executable modules, as an expression of software entities to drive machine learning. However, we found that the effectiveness of such a framework highly suffers from having insufficient samples, which is caused by the low success rate of disassembly due to the intrinsic complexity of the problem. In this paper, we propose to increase the success rate of disassembly by allowing inaccurate disassembling, with the attempt to increase the number of successful disassembled samples to improve OpCode-driven malware detection. We built a lightweight disassembler D-light based on the linear swap disassembly method to avoid known issues with the recursive descent manner of IDA Pro. We carried out experiment to evaluate the performance, effectiveness, and other design factors of adopting D-light and IDA Pro as disassemblers for malware detection. The empirical study shows the D-light is both more efficient and more effective than IDA Pro in supporting malware detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call