Abstract

Malware detection engines are required a high True Positive Rate (TPR) and low False Positive rate (FPR). For example, to obtain the VB100 certification conducted by Virus Bulletin in the U.K., the detector's TPR must be 99.5% or higher, and FPR less than 0.01%. However, it is difficult for signature-based malware detection engines to detect zero-day or new variants of malware. For such malware, an approach based on machine learning is considered effective because the features of specimens are analyzed in a complex manner. Therefore, we created a malware detection model from surface analysis logs, and PE header dumps by machine learning using the FFRI Dataset 2018. Furthermore, we verified the accuracy under the constraint that the FPR is less than a certain value. As a result, we succeeded in creating a new model with high accuracy: when FPR is set to be less than 1%, TPR is 99.7%, when FPR is set to be less than 0.1%, TPR is 98.7%, and when FPR is set to be less than 0.01% and TPR is 94.5%. In addition, we revealed features with high contribution to malware detection in this model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call