Efficient Malware Analysis Using Metric Embeddings

Ethan M Rudd,Daniel Olszewski,Edward Raff,David Krisiloff,Scott Coull,James Holt

doi:10.1145/3615669

Abstract

Real-world malware analysis consists of a complex pipeline of classifiers and data analysis—from detection to classification of capabilities to retrieval of unique training samples from user systems. In this article, we aim to reduce the complexity of these pipelines through the use of low-dimensional metric embeddings of Windows PE files, which can be used in a variety of downstream applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling of malicious and benign PE files with computationally-expensive, disassembly-based malicious capabilities information. Using this enhanced labeling, we derive several different types of efficient metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. Our evaluation examines performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that low-dimensional, computationally-efficient metric embeddings maintain performance with little decay. This offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced overhead and complexity. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Malware Analysis Using Metric Embeddings

Abstract

Talk to us

Similar Papers

More From: Digital Threats: Research and Practice

Lead the way for us

Journal: Digital Threats: Research and Practice	Publication Date: Mar 21, 2024
Citations: 2

Similar Papers

PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets
Luca Nanni ... Arif Canakoglu
BMC Bioinformatics | VOL. 20
Luca Nanni, et. al.Luca Nanni ... Arif Canakoglu
08 Nov 2019
BMC Bioinformatics | VOL. 20

Open-source analytical pipeline for robust data analysis, visualizations and sharing in crop breeding
Waseem Hussain ... Joie Ramos
Plant methods | VOL. 18
Waseem Hussain, et. al.Waseem Hussain ... Joie Ramos
05 Feb 2022
Plant methods | VOL. 18

HOME-BIO (sHOtgun MEtagenomic analysis of BIOlogical entities): a specific and comprehensive pipeline for metagenomic shotgun sequencing data analysis
Carlo Ferravante ... Ylenia D’Agostino
BMC Bioinformatics | VOL. 22
Carlo Ferravante, et. al.Carlo Ferravante ... Ylenia D’Agostino
01 Jul 2021
BMC Bioinformatics | VOL. 22

ScRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing
Kejie Li ... Zhengyu Ouyang
BMC Genomics | VOL. 24
Kejie Li, et. al.Kejie Li ... Zhengyu Ouyang
02 May 2023
BMC Genomics | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Malware Analysis Using Metric Embeddings

Abstract

Talk to us

Similar Papers

More From: Digital Threats: Research and Practice