AAPFE: Aligned Assembly Pre-Training Function Embedding for Malware Analysis

Hairen Gui,Chunyan Zhang,Ke Tang,Fudong Liu,Meng Qiao,Yizhao Huang,Zheng Shan

doi:10.3390/electronics11060940

Abstract

The use of natural language processing to analyze binary data is a popular research topic in malware analysis. Embedding binary code into a vector is an important basis for building a binary analysis neural network model. Current solutions focus on embedding instructions or basic block sequences into vectors with recurrent neural network models or utilizing a graph algorithm on control flow graphs or annotated control flow graphs to generate binary representation vectors. In malware analysis, most of these studies only focus on the single structural information of the binary and rely on one corpus. It is difficult for vectors to effectively represent the semantics and functionality of binary code. Therefore, this study proposes aligned assembly pre-training function embedding, a function embedding scheme based on a pre-training aligned assembly. The scheme creatively applies data augmentation and a triplet network structure to the embedding model training. Each sub-network extracts instruction sequence information using the self-attention mechanism and basic block graph structure information with the graph convolution network model. An embedding model is pre-trained with the produced aligned assembly triplet function dataset and is subsequently evaluated against a series of comparative experiments and application evaluations. The results show that the model is superior to the state-of-the-art methods in terms of precision, precision ranking at top N (p@N), and the area under the curve, verifying the effectiveness of the aligned assembly pre-training and multi-level information extraction methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AAPFE: Aligned Assembly Pre-Training Function Embedding for Malware Analysis

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Mar 17, 2022
License type: CC BY 4.0

Similar Papers

A comparison between wavelet based static and dynamic neural network approaches for runoff prediction
Muhammad Shoaib ... Mudasser Muneer Khan
Journal of Hydrology | VOL. 535
Muhammad Shoaib, et. al.Muhammad Shoaib ... Mudasser Muneer Khan
06 Feb 2016
Journal of Hydrology | VOL. 535

Interpretable recurrent neural network models for dynamic prediction of the extubation failure risk in patients with invasive mechanical ventilation in the intensive care unit.
Zhixuan Zeng ... Xianming Tang
BioData Mining | VOL. 15
Zhixuan Zeng, et. al.Zhixuan Zeng ... Xianming Tang
27 Sep 2022
BioData Mining | VOL. 15

Recurrent neural networks model for WiFi-based indoor positioning system
Yuan Lukito ... Antonius Rachmat Chrismanto
-
Yuan Lukito, et. al.Yuan Lukito ... Antonius Rachmat Chrismanto
01 Nov 2017
01 Nov 2017

Process structure-based recurrent neural network modeling for model predictive control of nonlinear processes
Zhe Wu ... Panagiotis D Christofides
Journal of Process Control | VOL. 89
Zhe Wu, et. al.Zhe Wu ... Panagiotis D Christofides
09 Apr 2020
Journal of Process Control | VOL. 89

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AAPFE: Aligned Assembly Pre-Training Function Embedding for Malware Analysis

Abstract

Talk to us

Similar Papers

More From: Electronics