Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Yongjun Lee,Hyun Kwon,Sang-Hoon Choi,Sung Hoon Baek,Seung-Ho Lim,Ki-Woong Park

doi:10.3390/app9194086

Abstract

Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method—Instruction2vec—an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using Instruction2vec, based on Word2vec; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks—Instruction2vec, Word2vec, and Binary2img—to assess the efficiency of Instruction2vec. We used the Juliet Test Suite, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code.

Highlights

Potential software weakness that can lead to exploitable security vulnerabilities continues to pose a risk to computer systems
We propose an improved static binary analysis technique that automatically learns software weakness using machine learning to overcome the above-mentioned limitation
Our framework is a combination of machine learning and static binary analysis, which can produce a great synergy when trained with a dataset of software weakness

Summary

Introduction

Potential software weakness that can lead to exploitable security vulnerabilities continues to pose a risk to computer systems. Static binary analysis detects vulnerabilities without executing binary code. Most static binary analysis processes generate a model by abstracting code and match the generated model to an existing pattern or rule [2,3,4]. Our framework is a combination of machine learning and static binary analysis, which can produce a great synergy when trained with a dataset of software weakness. We propose this framework to overcome the limitations of existing pattern-based analysis. The pattern-based approach of static binary analysis cannot handle increasing vulnerabilities rapidly. Our framework can improve performance by learning from an increasing number of datasets, whereas existing static binary analysis provides the same performance regardless of dataset size. The first step in most static binary analysis methods is the modeling of assembly code, which is the conversion of binary code into an intermediate language or abstracting code

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Sep 30, 2019
Citations: 28	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

A comparative study on software vulnerability static analysis techniques and tools
Peng Li ... Baojiang Cui
-
Peng Li, et. al. Peng Li ... Baojiang Cui
01 Dec 2010
01 Dec 2010

An Efficient Metric-Based Approach for Static Use-After-Free Detection
Haolai Wei ... Xiaofan Nie
-
Haolai Wei, et. al.Haolai Wei ... Xiaofan Nie
01 Dec 2022
01 Dec 2022

Multi-class vulnerability prediction using value flow and graph neural networks
Connor Mclaughlin ... Yi Lu
Neural Computing and Applications | VOL. 36
Connor Mclaughlin, et. al.Connor Mclaughlin ... Yi Lu
20 May 2024
Neural Computing and Applications | VOL. 36

On the Integration of Software Testing and Formal Analysis
Pietro Braione ... Giovanni Denaro
-
Pietro Braione, et. al.Pietro Braione ... Giovanni Denaro
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences