Fine-Grained Compiler Identification With Sequence-Oriented Neural Modeling

Zhenzhou Tian,Yaqian Huang,Yanping Chen,Borun Xie,Lingwei Chen,Dinghao Wu

doi:10.1109/access.2021.3069227

Zhenzhou Tian, Yaqian Huang + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3069227

Copy DOI

Abstract

Different compilers and optimization levels can be used to compile the source code. Revealed in reverse from the produced binaries, these compiler details facilitate essential binary analysis tasks, such as malware analysis and software forensics. Most existing approaches adopt a signature matching based or machine learning based strategy to identify the compiler details, showing limits in either the detection accuracy or granularity. In this work, we propose NeuralCI (Neural modeling-based Compiler Identification) to infer these compiler details including compiler family, optimization level and compiler version on individual functions. The basic idea is to formulate sequence-oriented neural networks to process normalized instruction sequences generated using a lightweight function abstraction strategy. To evaluate the performance of NeuralCI, a large dataset consisting of 854,858 unique functions collected from 19 widely used real-world projects is constructed. The experiments show that NeuralCI achieves averagely 98.6% accuracy in identifying the compiler family, 95.3% accuracy in identifying the optimization level, 88.7% accuracy in identifying the compiler version, 94.8% accuracy in identifying the compiler family and optimization level, and 83.0% accuracy in identifying all compiler components simultaneously, outperforming existing function level compiler identification methods in terms of both detection accuracy and comprehensiveness.

Highlights

In the software production process, diverse toolchains and toolchain settings can be adopted to transform the source code to the final binary
As its major subtask to focus on the compilation phase, compiler identification attempts to infer from a piece of binary code the compiler-related details such as the specific compiler family, the optimization options, etc., which can facilitate essential binary analysis tasks such
EVALUATION In the following parts, we evaluate the performance of NeuralCI on identifying the compiler family, optimization level, compiler version and compiler setting combination respectively, and report the comparative results across the neural network models as well as against existing function level methods that support the detection of corresponding compiler settings

Summary

INTRODUCTION

In the software production process, diverse toolchains and toolchain settings can be adopted to transform the source code to the final binary. PROBLEM OVERVIEW The goal of compiler identification is to reveal in reverse from the final produced binary the compiler-related details applied in processing the program source code The feasibility of this task lies in the usually significant differences imposed by different compiler and optimization settings. Inspired by the tremendous successes and superior feature learning power of deep learning in various program analysis tasks [16], [23], [34], [45], [48], [51], in this work, we resort to typical neural network structures to automatically capture and select the scattered, subtle yet significant features that manifest compiler settings, so as to achieve less human intervened yet effective and efficient fine-grained compiler identification

PROBLEM DEFINITION

NEURAL NETWORK MODELS

Results

DISCUSSION

Findings

VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 52	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fine-Grained Compiler Identification With Sequence-Oriented Neural Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Characterizing Soft Error Vulnerability of CPUs Across Compiler Optimizations and Microarchitectures
George Papadimitriou ... Dimitris Gizopoulos
-
George Papadimitriou, et. al.George Papadimitriou ... Dimitris Gizopoulos
01 Nov 2021
01 Nov 2021

Tackling Androids Native Library Malware with Robust, Efficient and Accurate Similarity Measures
Anatoli Kalysch ... Tilo Müller
-
Anatoli Kalysch, et. al.Anatoli Kalysch ... Tilo Müller
27 Aug 2018
27 Aug 2018

Understand Code Style: Efficient CNN-Based Compiler Optimization Recognition System
Shouguo Yang ... Yuan Ma
-
Shouguo Yang, et. al.Shouguo Yang ... Yuan Ma
01 May 2019
01 May 2019

Cole
Kenneth Hoste ... Lieven Eeckhout
-
Kenneth Hoste, et. al.Kenneth Hoste ... Lieven Eeckhout
06 Apr 2008
06 Apr 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine-Grained Compiler Identification With Sequence-Oriented Neural Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access