Generating sparse explanations for malicious Android opcode sequences using hierarchical LIME

Jeff Mitchell,Niall Mclaughlin,Jesus Martinez-Del-Rincon

doi:10.1016/j.cose.2023.103637

Abstract

In malware analysis, understanding the reasons behind a decision is important for building trust on the system. In the case of opcode-sequence-based classifiers, when standard explanation methods, such as LIME, are applied, the resulting explanation may not provide much insight into the salient parts of the input sequence. This is because LIME treats each opcode as an independent feature, and perturbing this feature will not cause a significant change in the output, meaning the resulting explanation tends to look like random noise. In this paper, we introduce a novel method Hierarchical-LIME (H-LIME) to address this issue. We take into consideration the hierarchical structure of the program, composed of classes and methods. We show that when H-LIME is applied at the level of classes and methods the resulting explanation is sparser, vastly helping improve its interpretability. We conduct extensive experiments by evaluating our proposed method against criteria for accuracy, completeness, sparsity, stability and efficiency. We show that our method significantly improves on all the evaluation criteria compared to other explainability methods.

Full Text