Abstract
Developing and evaluating malware classification pipelines to reflect real-world needs is as vital to protect users as it is hard to achieve. In many cases, the experimental conditions when the approach was developed and the deployment settings mismatch, which causes the solutions not to achieve the desired results. In this work, we explore how unrealistic project and evaluation decisions in the literature are. In particular, we shed light on the problem of label delays, i.e., the assumption that ground-truth labels for classifier retraining are always available when in the real world they take significant time to be produced, which also causes a significant attack opportunity window. In our analyses, among diverse aspects, we address: (1) The use of metrics that do not account for the effect of time; (2) The occurrence of concept drift and ideal assumptions about the amount of drift data a system can handle; and (3) Ideal assumptions about the availability of oracle data for drift detection and the need for relying on pseudo-labels for mitigating drift-related delays. We present experiments based on a newly proposed exposure metric to show that delayed labels due to limited analysis queue sizes impose a significant challenge for detection (e.g., up to a 75% greater attack opportunity in the real world than in the experimental setting) and that pseudo-labels are useful in mitigating the delays (reducing the detection loss to only 30% of the original value).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.