Abstract

Binary code traceability aims to use the relevant characteristics of anonymous binary codes to identify concealed authors or teams and replace error-prone and time-consuming manual reverse engineering tasks with automated systems. Although significant progress has been made in source code traceability technology, research on tracking binary files is still limited. Hence, we propose a feature extraction method and deep learning model that exploit the sequence and structure information of binary codes to identify the authors of anonymous and malicious binary codes and their relations with other known binary code families. We further propose a new multigranularity information fusion feature based on biological genes oriented to the traceability of binary codes. The evaluations conducted on the Google Code Jam (GCJ) dataset indicate that our method can accurately trace the binary code from 1000 people to the target author with an accuracy rate of 71%. Further, experimental results verify the robustness of the proposed model. For malicious code datasets, in particular, the proposed method achieved a stable traceability accuracy rate for malicious samples using only a small number of training samples. For the problem of malicious code tracking, in 300 team organizations, the proposed method achieved a code-tracing accuracy rate of 82%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.