Abstract

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

Highlights

  • IntroductionSoftware security plays a crucial role in our society. Software vendors and users are in an arms race against cybercriminals, investing significant efforts towards identifying vulnerabilities and patching them, sometimes releasing updates mere hours after a release

  • Nowadays, software security plays a crucial role in our society

  • We propose a novel approach for translating natural language into shellcode in assembly language, based on neural machine translation (NMT)

Read more

Summary

Introduction

Software security plays a crucial role in our society. Software vendors and users are in an arms race against cybercriminals, investing significant efforts towards identifying vulnerabilities and patching them, sometimes releasing updates mere hours after a release. Code-injection attacks have been drastically increasing with the growth of applications exposed to the Internet (Ray and Ligatti 2012), as shown by statistics from the Common Vulnerabilities and Exposures (CVE) database (CVE 2021). These attacks deliver and run malicious code (payload) on the victims’ machine, in order to give attackers control of the target system. Machine translation refers to the translation of a language into another by the means of a computerized system (Dorr et al 1999) It is defined as an optimization problem, which maximizes the conditional probability that a sentence ω(t) in the target language is the likely translation of a sentence ω(s) in the source language, by using a scoring function ψ :. The decoder network converts the encoding into a sentence in the target language by defining the conditional probability p(ω(t) |ω(s) )

Objectives
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call