Abstract

YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process. In this paper, we analyze the true nature of regular expressions in YARA and their implementation. We have, in fact, discovered several reasons why regular expressions can slow down scanning based on the nature of the used algorithm, Aho-Corasick. We have proposed a new version of this algorithm and have implemented it in the original version of this tool. The experiments are presented, proving that the speed of pattern matching with regular expressions can indeed be improved. In selected cases, the proposed version was about 27% faster than the original version. And in instances where strings were optimized for the original version, their speed was found to be comparable.

Highlights

  • In this paper, the pattern matching problem is understood by finding all occurrences of patterns in the input

  • The pattern matching problem is the problem of finding all valid shifts with which all given patterns p occur in a given text T

  • YARA is a well-known tool in malware detection and it becomes a vital part of threat intelligence infrastructure in many companies worldwide

Read more

Summary

Introduction

The pattern matching problem is understood by finding all occurrences of patterns in the input. The use of regular expressions is allowed as well: We assume that the text is an array T [1 . N] of length n, where the characters are drawn from a finite alphabet. The pattern is a string or regular expression. Each pattern p represents one or more literal string arrays p [1 . M] of length m over the alphabet. We say that pattern p occurs with shift s in text T if 0 ≤ s ≤ n − m and T [s + 1 . The pattern matching problem is the problem of finding all valid shifts with which all given patterns p occur in a given text T

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.