The evolution of IoT malware and the effectiveness of defense strategies, e.g., leveraging malware family classification, have driven the development of advanced classification learning models. These models, particularly those that utilize model-extracted features, significantly enhance classification performance while minimizing the need for extensive expert knowledge from developers. However, a critical challenge lies in the interpretability of these learning models, which can obscure potential security risks. Among these risks are backdoor attacks, a sophisticated and deceptive threat where attackers induce malicious behaviors in the model under specific triggers.In response to the growing need for integrity and reliability in these models, this work assesses the vulnerability of state-of-the-art IoT malware classification models to backdoor attacks. Given the complexities of attacking model-based classifiers, we propose a novel trigger generation framework, B-CTG, supported by a specialized training procedure. This framework enables B-CTG to dynamically poison or attack samples to achieve specific objectives. From an attacker’s perspective, the design and training of B-CTG incorporate knowledge from the IoT domain to ensure the attack’s effectiveness. We conduct experiments under two distinct knowledge assumptions: the main evaluation, which assesses the attack method’s performance when the attacker has limited control over the model training pipeline, and the transferred setting, which further explores the significance of knowledge in predicting attacks in real-world scenarios.Our in-depth analysis focuses on attack performance in specific scenarios rather than a broad examination across multiple scenarios. Results from the main evaluation demonstrate that the proposed attack strategy can achieve high success rates even with low poisoning ratios, though stability remains a concern. Additionally, the inconsistent trends in model performance suggest that designers may struggle to detect the poisoned state of a model based on its performance alone. The transferred setting highlights the critical importance of model and feature knowledge for successful attack predictions, with feature knowledge proving particularly crucial. This insight prompts further investigation into model-agnostic mitigation methods and their effectiveness against the proposed attack strategy, with findings indicating that stability remains a significant concern for both attackers and defenders.
Read full abstract