Finding contextual clues to malware using a large corpus

Neil C Rowe

doi:10.1109/iscc.2015.7405521

Abstract

Identification of malware is a critical problem in computer security. Many signature-identification, behavior-recognition, and reputation-based tools are available for host-based detection. However, so many files are present on systems today that checking all files is time-consuming, and better methods are needed to suggest which files are of highest priority to check in partial scans. This work developed and tested local contextual clues to malware in the metadata of file systems on an international corpus of 248 million files on 3961 drives. 398,949 hash values of malware were found in this corpus using five methods, and 3,681,211 hash values of non-malware were chosen for comparison using three methods. Malware identification rates were compared for the fifteen combinations and were cross-correlated for different types of drives and file types. Results showed that different malware identification methods find significantly different things. Then the strength of particular local clues in file metadata (directory and file names, sizes, times, and hash values) was assessed and results were compared for the fifteen combinations. Some classic clues (e.g. rare file extensions and deletion status) were confirmed and others were not (e.g. double extensions and occurrence in the operating system). With this data, a program was implemented to estimate the likelihood that a given file was malware based solely on its metadata context. With three random subsets of our corpus, our methods gave 51 times better precision (fraction of malware in files identified as malware) with 70% better recall (fraction of malware detected) than the approach of inspecting executables alone. They also ran significantly faster than signature checking, and can be used before other kinds of malware analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Finding contextual clues to malware using a large corpus

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Security in Distributed Computing

Scalable Computing Practice and Experience | VOL. 1

01 Jan 1998
Scalable Computing Practice and Experience | VOL. 1

Semantic Data De-duplication for archival storage systems
Chuanyi Liu ... Yu Gu
-
Chuanyi Liu, et. al.Chuanyi Liu ... Yu Gu
01 Aug 2008
01 Aug 2008

Establishing a computer security incident response capability (CSIRC)
John P Wack
-
John P WackJohn P Wack
01 Jan 1991
01 Jan 1991

Detection of relevant digital evidence in the forensic timelines
Eva Markova ... Kristina Kovacova
-
Eva Markova, et. al.Eva Markova ... Kristina Kovacova
30 Jun 2022
30 Jun 2022

Publication Date: Jul 1, 2015
Citations: 12	License type: cc0

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finding contextual clues to malware using a large corpus

Abstract

Talk to us

Similar Papers