Binary code traceability of multigranularity information fusion from the perspective of software genes

Yizhao Huang,Meng Qiao,Fudong Liu,Xingwei Li,Hairen Gui,Chunyan Zhang

doi:10.1016/j.cose.2022.102607

Abstract

Binary code traceability aims to use the relevant characteristics of anonymous binary codes to identify concealed authors or teams and replace error-prone and time-consuming manual reverse engineering tasks with automated systems. Although significant progress has been made in source code traceability technology, research on tracking binary files is still limited. Hence, we propose a feature extraction method and deep learning model that exploit the sequence and structure information of binary codes to identify the authors of anonymous and malicious binary codes and their relations with other known binary code families. We further propose a new multigranularity information fusion feature based on biological genes oriented to the traceability of binary codes. The evaluations conducted on the Google Code Jam (GCJ) dataset indicate that our method can accurately trace the binary code from 1000 people to the target author with an accuracy rate of 71%. Further, experimental results verify the robustness of the proposed model. For malicious code datasets, in particular, the proposed method achieved a stable traceability accuracy rate for malicious samples using only a small number of training samples. For the problem of malicious code tracking, in 300 team organizations, the proposed method achieved a code-tracing accuracy rate of 82%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Binary code traceability of multigranularity information fusion from the perspective of software genes

Abstract

Talk to us

Similar Papers

More From: Computers & Security

Lead the way for us

Journal: Computers & Security	Publication Date: Jan 8, 2022
Citations: 5

Similar Papers

Introduction to section most cited journal articles in software engineering
Claes Wohlin
Information and Software Technology | VOL. 50
Claes WohlinClaes Wohlin
16 Oct 2007
Information and Software Technology | VOL. 50

A multi-view ensemble model based on semi-supervised feature learning for small sample classification of PolSAR images
Mohsen Darvishnezhad
International Journal of Remote Sensing | VOL. 45
Mohsen DarvishnezhadMohsen Darvishnezhad
28 Jan 2024
International Journal of Remote Sensing | VOL. 45

Behavior-Driven Development in Malware Analysis
...
-
, et. al. ...
18 Mar 2016
18 Mar 2016

A novel self‐supervised ensemble learning framework for land use and land cover classification of polarimetric synthetic aperture radar images
Mohsen Darvishnezhad ... Mohammad Ali Sebt
IET Radar, Sonar & Navigation | VOL. 18
Mohsen Darvishnezhad, et. al.Mohsen Darvishnezhad ... Mohammad Ali Sebt
10 Oct 2023
IET Radar, Sonar & Navigation | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Binary code traceability of multigranularity information fusion from the perspective of software genes

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Security

More From: Computers & Security