Abstract
A malware phylogeny model is an estimation of the derivation relationships between a set of malware samples. Systems that construct phylogeny models are expected to be useful for malware analysts. While several such systems have been proposed, little is known about the consistency of their results on different data sets, about their generalizability across different types of malware evolution. This paper explores these issues using two artificial malware history generators: systems that simulate malware evolution according to different evolution models. A quantitative study was conducted using two phylogeny model construction systems and multiple samples of artificial evolution. High variability was found in the quality of their results on different data sets, and the systems were shown to be sensitive to the characteristics of evolution in the data sets. The results call into question the adequacy of evaluations typical in the field, raise pragmatic concerns about tool choice for malware analysts, and underscore the important role that model-based simulation is expected to play in evaluating and selecting suitable malware phylogeny construction systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have