Abstract
Malware increasingly threatens users around the world on a variety of cybernetic platforms, resulting in damages of billions of dollars each year. In recent years, in order to improve the detection capabilities of widely used antivirus (AV) tools, machine learning (ML) algorithms and dynamic malware analysis have been leveraged for the extraction and learning of rich multivariate time-series data (MTSD) associated with behavioral information. Such MTSD can be exploited using a time-interval temporal pattern (TP) mining approach, however this approach has not been widely explored for the task of malware detection. The use of TPs enables the discovery of complex temporal relations between different variables, improves the ability to cope with missing values and noisy data, and provides explainability. In light of the continuous creation of new unknown malware on a daily basis, detection mechanisms require frequent updating to keep pace with the changing reality. Active learning (AL) can address the updatability gap by efficiently selecting and acquiring a small yet informative set of new samples while reducing the labeling efforts of experts; AL also provides maximal improvement of machine-learning-based detection models, which can further contribute to the updatability of antimalware tools. However, the use of AL methods for the acquisition of time-interval TP-based samples has yet to be explored. In this paper, we present novel AL methods and a detection framework for improved malware detection based on dynamic analysis, time-interval TPs, and ML algorithms. The proposed framework is capable of both prioritizing the acquisition of malicious samples and improving the malware detection capabilities of ML classifiers and antimalware tools. Our proposed framework was evaluated in an extensive set of experiments on a comprehensive data collection of 9,328 portable executables (5,000 benign and 4,328 malicious) that were executed in the Windows 10 environment. The results demonstrated our AL methods’ ability to prioritize the acquisition of malware and managed to acquire up to 93.5% of the malicious files each day, allowing frequent updating of antimalware tools. In addition, our framework was shown to be effective in improving the detection capabilities of several ML classifiers over time, with the best results (AUC of 95.15%) achieved by the SVM classifier. Our framework also showed that TPs can be used to identify emerging trends in malicious behavior.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.