Automated Data-Processing Function Identification Using Deep Neural Network

Hongyu Kuang,Xing Zhang,Ruilin Li,Chao Feng,Jian Wang

doi:10.1109/access.2020.2981537

Hongyu Kuang, Xing Zhang + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.2981537

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 2	License type: CC BY 4.0

Affiliation: National University of Defense Technology

Abstract

The number of software vulnerabilities is increasing year by year. In the era of big data, data-processing software with many users is more concerned by hackers. It is essential to improve the efficiency of discovering vulnerabilities in data-processing software. We noticed that in the process of discovering vulnerabilities, some problems of existing technology such as fuzzing, symbolic execution, and taint analysis have more or fewer relationships with data-processing functions. In fuzzing, there are two types of sanity checks toward the target program: NCC (Non-critical check) and CC (critical check). It is usually challenging to bypass such a sanity check, which leads to low code coverage during fuzzing. In symbolic execution, the constraint solver still has the problem of trying to deal with the constraints of complex algorithms. In taint analysis, the problem of over-taint and under-taint is always the key to affect the accuracy of the results. Therefore, to solve the above problems, it is necessary to identify the data-processing function. Based on identifying data-processing functions, we could identify those sanity checks, ease the solution of complex constraints, and understand the way of taints propagation to assist in software vulnerability discovery and analysis. This paper proposed a method called DPFI(data-processing function identification) for identifying data-processing functions with deep neural networks. We collected 37000 functions from GitHub and implemented the method on the data set with several neural networks, among which the performance of CNN achieved best and $F_{1}$ -score was 0.90. We then applied the trained model on CGC(cyber grand challenge) data and real softwares for testing. For CGC, we got 448 functions in 20 programs, in which 35 were identified as data-processing functions. For real softwares, such as FFmpeg, 7zip, jpeg, the precision rate all reached 0.90 and $F_{1}$ -score was above 0.87.

Highlights

In the era of big data, a variety of data is produced every second
In 2018, among all Windows products affected by vulnerabilities, the Office products accounted for 17% and the Adobe products accounted for 2%
We proposed a method for identifying data-processing functions accurately and quickly based on convolutional neural networks

Summary

Introduction

In the era of big data, a variety of data is produced every second. With the continuous improvement of computing power, forms of data-processing have emerged endlessly, and people’s dependence on such data-processing software has increased gradually. We noticed that data-processing software with a large number of users, such as Adobe and Office products, is more vulnerable to the close attention of hackers. In 2018, among all Windows products affected by vulnerabilities, the Office products accounted for 17% and the Adobe products accounted for 2%. The Office products and the Adobe products accounted for the highest vulnerabilities, exceeding 80% [1].

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Data-Processing Function Identification Using Deep Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

T-Fuzz: Fuzzing by Program Transformation
Hui Peng ... Mathias Payer
-
Hui Peng, et. al.Hui Peng ... Mathias Payer
01 May 2018
01 May 2018

Directed symbolic execution for binary vulnerability mining
Bo Wu ... Bin Zhang
-
Bo Wu, et. al. Bo Wu ... Bin Zhang
01 May 2014
01 May 2014

Automatic software vulnerability detection based on guided deep fuzzing
Jun Cai ... Jinquan Men
-
Jun Cai, et. al.Jun Cai ... Jinquan Men
01 Jun 2014
01 Jun 2014

Fault Prediction, Localization, and Repair (Dagstuhl Seminar 13061)
...
-
, et. al. ...
01 Jan 2013
Fault Prediction, Localization, and Repair (Dagstuhl Seminar 13061)
...

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Data-Processing Function Identification Using Deep Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access