Ransomware Detection by Distinguishing API Call Sequences through LSTM and BERT Models

Tu-Liang Lin,Hong-Yi Chang,Wha-Lee Tseng,Chun-Jun Zhuang,Shu-Cheng Lin,Yuan-Yao Chiang,Bo-Hao Zhang,Tsung-Yen Yang

doi:10.1093/comjnl/bxad005

Abstract

Abstract Nowadays, ransomware evolved rapidly and the prevention of ransomware has become an important issue. The threat of ransomware is much more sophisticated than before for governments and enterprises; breaches or corruption of sensitive data will cause huge impact on the organization. Early detection is one of the effective method to prevent the ransomware attack. Modern ransomware detection technologies can be divided into two categories: static analysis and dynamic analysis. Dynamic analysis observes the behavior of the running program. Previous research adopted machine learning approach for dynamic analysis and API sequence dataset were used to trained machine learning models for dynamic analysis. In this research, we collected the API calls of the ransomware from reports generated by Cuckoo Sandbox and proposed two detecting models using BERT and LSTM. The result shows that both BERT and LSTM models can successfully predict ransomware with 95% high accuracy. We aimed to compare the performance of two text-based learning model, LSTM and BERT, and analyze the pros and cons. The result shows that API sequence data can be used to train effective ransomware detection models in text-based manner.

Full Text