Abstract
Multi-segment pattern is a common virus structure, and there are 2229 multi-segment patterns in the ClamAV virus database version 54. We observe that (i) the pattern set contains over 100 nondistinctive short segments, e.g. 2 bytes of zero; (ii) some of the 2-byte segments can appear many times in one or more patterns; (iii) some patterns contain a large number of 2-byte segments; (iv) many short segments are substrings/suffixes of other longer segments; and (v) adjacent segments may contain overlapping bytes. The aforementioned properties pose great difficulties to the conventional detection methods. Instead of viewing the virus signature as a byte sequence, we regard the pattern to be composed of a sequence of tokens, where each token corresponds to a segment. We transform the input byte stream into a token stream. The detection engine will then process the token stream to determine if any virus signatures can be found. Our detection method for the 2229 multi-segment patterns can be implemented on a field programmable gate array (FPGA) using 290KB on-chip memory. The device can operate at 170MHz and it can process 1 byte per cycle. The processing architecture is memory based. When the pattern set is updated, the FPGA need not be reconfigured.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have