Terminator : A Secure Coprocessor to Accelerate Real-Time AntiViruses Using Inspection Breakpoints
AntiViruses (AVs) are essential to face the myriad of malware threatening Internet users. AVs operate in two modes: on-demand checks and real-time verification. Software-based real-time AVs intercept system and function calls to execute AV’s inspection routines, resulting in significant performance penalties as the monitoring code runs among the suspicious code. Simultaneously, dark silicon problems push the industry to add more specialized accelerators inside the processor to mitigate these integration problems. In this article, we propose Terminator , an AV-specific coprocessor to assist software AVs by outsourcing their matching procedures to the hardware, thus saving CPU cycles and mitigating performance degradation. We designed Terminator to be flexible and compatible with existing AVs by using YARA and ClamAV rules. Our experiments show that our approach can save up to 70 million CPU cycles per rule when outsourcing on-demand checks for matching typical, unmodified YARA rules against a dataset of 30 thousand in-the-wild malware samples. Our proposal eliminates the AV’s need for blocking the CPU to perform full system checks, which can now occur in parallel. We also designed a new inspection breakpoint mechanism that signals to the coprocessor the beginning of a monitored region, allowing it to scan the regions in parallel with their execution. Overall, our mechanism mitigated up to 44% of the overhead imposed to execute and monitor the SPEC benchmark applications in the most challenging scenario.
- Dissertation
6
- 10.17918/etd-6329
- Jul 16, 2021
Despite efforts to mitigate the malware threat, the proliferation of malware continues, with record-setting numbers of malware samples being discovered each quarter. Malware are any intentionally malicious software, including software designed for extortion, sabotage, and espionage. Traditional malware defenses are primarily signature-based and heuristic-based, and include firewalls, intrusion detection systems, and antivirus software. Such defenses are reactive, performing well against known threats but struggling against new malware variants and zero-day threats. Together, the reactive nature of traditional defenses and the continuing spread of malware motivate the development of new techniques to detect such threats. One promising set of techniques uses features extracted from system call traces to infer malicious behaviors. This thesis studies the problem of detecting and classifying malicious processes using system call trace analysis. The goal of this study is to identify techniques that are 'lightweight' enough and exhibit a low enough false positive rate to be deployed in production environments. The major contributions of this work are (1) a study of the effects of feature extraction strategy on malware detection performance; (2) the comparison of signature-based and statistical analysis techniques for malware detection and classification; (3) the use of sequential detection techniques to identify malicious behaviors as quickly as possible; (4) a study of malware detection performance at very low false positive rates; and (5) an extensive empirical evaluation, wherein the performance of the malware detection and classification systems are evaluated against data collected from production hosts and from the execution of recently discovered malware samples. The outcome of this study is a proof-of-concept system that detects the execution of malicious processes in production environments and classifies them according to their similarity to known malware.
- Conference Article
35
- 10.1109/issre.2011.15
- Nov 1, 2011
We have previously reported [1] the results of an exploratory analysis of the potential gains in detection capability from using diverse AntiVirus products. The analysis was based on 1599 malware samples collected from a distributed honey pot deployment over a period of 178 days. The malware samples were sent to the signature engines of 32 different AntiVirus products hosted by the Virus Total service. The analysis suggested significant gains in detection capability from using more than one AntiVirus product in a one-out-of-two intrusion-tolerant setup. In this paper we present new analysis of this dataset to explore the detection gains that can be achieved from using more diversity (i.e. more than two AntiVirus products), how diversity may help to reduce the "at risk time" of a system and a preliminary model-fitting using the hyper-exponential distribution.
- Research Article
2
- 10.3844/jcssp.2017.290.300
- Aug 1, 2017
- Journal of Computer Science
The Antivirus (AV) products are utilized by home user's community to attain protection. To some extent, the AV meets users' expectations by detecting previously known malware samples. In this study, we question the set of events which should trigger the AV to scan data. Scanning every single piece of data as it moves from one location into another could be a demanding and performance-killing task. The AV faces a design challenge when deciding what kind of data to scan and when to do so. Typically, the on-access scanner component of the AV scans data upon moving from/to hard drives. Other occurrences of data movements are of equal importance. For example, data moves between different memory locations or between memory and network. In this study, we are motivated to explore what it needs to be done by the AV upon various data movements. We design and implement a system that has a capability of scanning memory when necessary. We recognize and intercept the most effective API calls that involve memory. Afterwards, we extract involved data and scan it if it has not been scanned before. We test our system against 15 real malware and find out that our system is capable of detecting all malware samples. Furthermore, we provide a thorough performance study to present the overhead of our system.
- Conference Article
8
- 10.1109/saint.2012.49
- Jul 1, 2012
Modern malware often changes their runtime behaviors in each execution to tolerate against malware analyses and detections. For example, when a malware copies itself on a file system, it can randomly determine its file name for avoiding the detections. Another example is that when a malware tries to connect its command and control server, it randomly chooses a domain name from a hard-coded domain name list to avoid being blocked by a static blacklist of malicious domain names. We assume that such random behaviors are unnecessary for benign software. Therefore the behaviors can be clues to distinguish malware from benign software. In this paper, we propose a novel malware detection method based on investigating the behavioral difference in multiple executions of suspicious software. Our proposed method conducts dynamic analysis on an executable file multiple times in the same sandbox environment so as to obtain plural lists of API call sequence, and then compares the lists to find the difference between the multiple executions. In the experiments with 5,697 malware samples and 819 benign software samples, the proposed method could detect about 67% malware samples and the false positive rate is about 1%. Moreover, the proposed method could detect 117 malware samples out of 273 malware samples which could not be detected by the antivirus software. Therefore we confirmed the possibility the proposed method may be able to improve the accuracy of malware detection utilizing in combination with other existing methods.
- Research Article
61
- 10.1504/ijsn.2007.012824
- Jan 1, 2007
- International Journal of Security and Networks
Fast virus scanning is becoming increasingly important in today's internet. While Moore's law continues to double CPU cycle speed, virus scanning applications fail to ride on the performance wave due to their frequent random memory accesses. This paper proposes Hash-AV, a virus scanning 'booster' technique that aims to take advantage of improvements in CPU performance. Using a set of hash functions and a Bloom filter array that fits in CPU second-level (L2) caches, Hash-AV determines the majority of 'no-match' cases without accesses to main memory. Experiments show that Hash-AV improves the performance of the open-source virus scanner Clam-AV by a factor of 2–10. The key to Hash-AV's success lies in a set of 'bad but cheap' hash functions that are used as initial hashes. The speed of Hash-AV makes it well suited for 'on-access' virus scanning, providing greater protections to the user. Through intercepting system calls and wrapping glibc libraries, we have implemented an 'on-access' version for Hash-AV+Clam-AV. The on-access scanner can examine input data at a throughput of over 200 Mb/s, making it suitable for network-based virus scanning.
- Research Article
18
- 10.1049/iet-ifs.2012.0192
- Jun 1, 2013
- IET Information Security
It is well accepted that basic protection against common cyber threats is important, so it is recommended to have antivirus (AV). However, what price do users pay in terms of performance and other usability factors? Although it is important for security researchers and system developers to understand how exactly the AV impacts the whole system, in this study the authors take the approach of tracing operating system (OS) events. The authors’ goal is to shed some light on this. To the best of the authors’ knowledge, this study is the first to present an OS‐aware approach to analyse and reason about AV performance impact. The authors’ results show that the main reason for performance degradation in the tasks the authors tested with AV software is that they mainly spend the extra time waiting on events. Sometimes AV does cause some central processing unit overhead, but events such as hard page faults (i.e. those that require disk accesses) are the main contributing factor to AV overhead. Owing to the AV's intrusive behaviour, the tasks in the authors’ experiments are caused to create more file input/output operations, page faults, system calls and threads than they normally do without AV installed.
- Conference Article
9
- 10.1145/3422575.3422775
- Sep 28, 2020
Fileless malware are recent threats to computer systems that load directly into memory, and whose aim is to prevent anti-viruses (AVs) from successfully matching byte patterns against suspicious files written on disk. Their detection requires that software-based AVs continuously scan memory, which is expensive due to repeated locks and polls. However, research advances introduced near-memory and in-memory processing, which allow memory controllers to trigger basic computations without moving data to the CPU. In this paper, we address AVs performance overhead by moving them to the hardware, i.e., we propose instrumenting processors’ memory controller or smart memories (near- and in-memory malware detection, respectively) to accelerate memory scanning procedures. To do so, we present MINI-ME, the Malware Identification based on Near- and In-Memory Evaluation mechanism, a hardware-based AV accelerator that interrupts the program’s execution if malicious patterns are discovered in their memory. We prototyped MINI-ME in a simulator and tested it with a set of 21 thousand in-the-wild malware samples, which resulted in multiple signatures matching with less than 1% of performance overhead and rates of 100% detection, and zero false-positives and false-negatives.
- Research Article
8
- 10.1007/s11416-009-0136-2
- Sep 25, 2009
- Journal in Computer Virology
In this paper, we propose an original black-box approach concerning antivirus products evaluation. Contrary to classical tests focusing on detection rates concerning a specific malware sample, we use a generic metamorphic engine to observe the detection products behaviors. We believe that this point of view presents a double interest: First, it offers an original way of evaluating current antivirus products focusing on the observed detection technique. More precisely, the use of metamorphic malware guarantees the difficulty of static signature based detection techniques to focus only on heuristic and behavioral detection approaches. Second, by pointing out current detection capabilities, we practically evaluate the danger that complex metamorphic malware could represent. To achieve this goal, we start with the description of a generic metamorphic engine acting in two steps: obfuscation and modeling. Then, we apply this engine to a real mass-mailing worm and propose the resulting metamorphic malware samples to current antivirus products. The observed results lead to a classification of detection techniques in two main categories: the first one, relying on static detection techniques, presents low detection rates obtained by heuristic analysis. The second one, composed of behavioral detection programs, mainly focuses on elementary suspicious actions. In all cases, no product was able to detect a global malware behavior. Consequently, we consider that metamorphic malware detection still represents a real challenge for antivirus products. Through this study, we hope to help defenders understand and defend against the threat represented by this class of malware.
- Conference Article
40
- 10.1109/glocom.2005.1577953
- Jan 1, 2005
Fast virus scanning is becoming increasingly important in today's Internet. While Moore's law continues to double CPU cycle speed, virus scanning applications fail to ride on the performance wave due to their frequent random memory accesses. This paper proposes Hash-AV, a virus scanning "booster" technique that aims to take advantage of improvements in CPU performance. Using a set of hash functions and a bloom filter array that fits in CPU second-level (L2) caches, Hash-AV determines the majority of "no-match" cases without accesses to main memory. Experiments show that Hash-AV improves the performance of the open-source virus scanner Clam-AV by a factor of 2.5 to 10. The key to Hash-AV's success lies in a set of "bad but cheap" hash functions that are used as initial hashes. The speed of Hash-AV makes it well suited for "on-access" virus scanning, providing greater protections to the user. Through intercepting system calls and wrapping glibc libraries, we have implemented an "on-access" version for Hash-AV+Clam-AV. The on-access scanner can examine input data at a throughput of over 200 Mb/s, making it suitable for network-based virus scanning.
- Book Chapter
238
- 10.1007/978-3-642-16161-2_2
- Jan 1, 2010
- Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
In this paper we address the following questions: From a networking perspective, do malicious programs (malware, bots, viruses, etc...) behave differently from benign programs that run daily for various needs? If so, how may we exploit the differences in network behavior to detect them? To address these questions, we are systematically analyzing the behavior of a large set (at the magnitude of 2,000) of malware samples. We present our initial results after analyzing 1000 malware samples. The results show that malicious and benign programs behave quite differently from a network perspective. We are still in the process of attempting to interpret the differences, which nevertheless have been utilized to detect 31 malware samples which were not detected by any antivirus software on Virustotal.com as of 01 April 2010, giving evidence that the differences between malicious and benign network behavior has a possible use in helping stop zero-day attacks on a host machine.
- Research Article
7
- 10.1016/j.jksuci.2023.101898
- Dec 28, 2023
- Journal of King Saud University - Computer and Information Sciences
An empirical study of problems and evaluation of IoT malware classification label sources
- Conference Article
14
- 10.1109/iccke.2013.6682867
- Oct 1, 2013
Malicious software, also called malware, is one of the major threats on the Internet today. Despite various antivirus programs, thousands of Internet hosts are daily infected with malware, such as viruses, worms, and Trojan horses. Due to using a variety of obfuscation techniques, polymorphic malware can easily evade signature-based detection techniques by continually changing their appearance or patterns. However, all polymorphic malware samples in the same malware family often follow the same behavioral pattern that can be used to generate a behavioral signature. In this paper, we propose MalHunter, a novel method based on sequence clustering and sequence alignment to automatic generation of behavioral signatures for polymorphic malware detection. We first generate a set of behavioral sequences for different samples of a polymorphic malware, each of which represents a thread's behavior. We then group similar behavioral sequences into the same cluster and generate an alignment pattern for each cluster. We finally build a multiple behavioral signature for the polymorphic malware. MalHunter stores fewer signatures in the signature database due to the generation of a multiple behavioral signature for different samples of each polymorphic malware. The experimental results on a malware collection suggest that MalHunter is both precise and succinct for effective matching and detection of polymorphic malware.
- Book Chapter
18
- 10.1007/978-3-642-33704-8_20
- Jan 1, 2012
Over the past years, we have experienced an increase in the quantity and complexity of malware binaries. This change has been fueled by the introduction of malware generation tools and reuse of different malcode modules. Recent malware appears to be highly modular and less functionally typified. A side-effect of this "composition" of components across different malware types, a growing number of new malware samples cannot be explicitly assigned to traditional classes defined by Anti-Virus (AV) vendors. Indeed, by nature, clustering techniques capture dominant behavior that could be a manifestation of only one of the malware component failing to reveal malware similarities that depend on other, less dominant components and other evolutionary traits. In this paper, we introduce a novel malware behavioral commonality analysis scheme that takes into consideration component-wise grouping, called behavioral mapping. Our effort attempts to shed light to malware behavioral relationships and go beyond simply clustering the malware into a family. To this end, we implemented a method for identifying soft clusters and reveal shared malware components and traits. Using our method, we demonstrate that a malware sample can belong to several groups (clusters), implying sharing of its respective components with other samples from the groups. We performed experiments with a large corpus of real-world malware data-sets and identified that we can successfully highlight malware component relationships across the existing AV malware families and variants.
- Conference Article
352
- 10.1145/1653662.1653736
- Nov 9, 2009
A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design, implement and evaluate a malware database management system called SMIT (Symantec Malware Indexing Tree) that can efficiently make such determination based on malware's function-call graphs, which is a structural representation known to be less susceptible to instruction-level obfuscations commonly employed by malware writers to evade detection of AV software. Because each malware program is represented as a graph, the problem of searching for the most similar malware program in a database to a given malware sample is cast into a nearest-neighbor search problem in a graph database. To speed up this search, we have developed an efficient method to compute graph similarity that exploits structural and instruction-level information in the underlying malware programs, and a multi-resolution indexing scheme that uses a computationally economical feature vector for early pruning and resorts to a more accurate but computationally more expensive graph similarity function only when it needs to pinpoint the most similar neighbors. Results of a comprehensive performance study of the SMIT prototype using a database of more than 100,000 malware demonstrate the effective pruning power and scalability of its nearest neighbor search mechanisms.
- Conference Article
1
- 10.1109/asiancon55314.2022.9909111
- Aug 26, 2022
Malware detection models are being built primarily focusing on signature or behavior type detection. In this paper, anti-forensic techniques are used to hide the malware from malware scanners using various approaches and making different changes to the source code of malware to prevent its detection. In this paper I have worked on two models with interchanging payloads and code segments for analysis to check the performance in each case. In this experiment many samples of malware from the recent attacks covering different malware families and intended attack areas have been used to check detection rates as well as new payloads have been created and merged with the existing malware to understand the behavior and combination of the payloads for multi system attacks and calculate the detection rates making the use of VirusTotal to check the detection. The use of different obfuscation techniques which include encoding the payload, code splitting, adding encryption, backdooring the file, Code injection Payload and finally making the use of different steganographic methods to carry the payload to maintain signature evasion have been used as a technique of payload delivery. The technique of manual unpacking has been used in this paper to unpack the malware and deliver the final attack and a framework of automated deployment methods have been laid for further work.