Can we generate shellcodes via natural language? An empirical study

Pietro Liguori,Bojan Cukic,Samira Shaikh,Erfan Al-Hossami,Domenico Cotroneo,Roberto Natella

doi:10.1007/s10515-022-00331-3

Abstract

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

Highlights

IntroductionSoftware security plays a crucial role in our society. Software vendors and users are in an arms race against cybercriminals, investing significant efforts towards identifying vulnerabilities and patching them, sometimes releasing updates mere hours after a release
Nowadays, software security plays a crucial role in our society
We propose a novel approach for translating natural language into shellcode in assembly language, based on neural machine translation (NMT)

Summary

Introduction

Software security plays a crucial role in our society. Software vendors and users are in an arms race against cybercriminals, investing significant efforts towards identifying vulnerabilities and patching them, sometimes releasing updates mere hours after a release. Code-injection attacks have been drastically increasing with the growth of applications exposed to the Internet (Ray and Ligatti 2012), as shown by statistics from the Common Vulnerabilities and Exposures (CVE) database (CVE 2021). These attacks deliver and run malicious code (payload) on the victims’ machine, in order to give attackers control of the target system. Machine translation refers to the translation of a language into another by the means of a computerized system (Dorr et al 1999) It is defined as an optimization problem, which maximizes the conditional probability that a sentence ω(t) in the target language is the likely translation of a sentence ω(s) in the source language, by using a scoring function ψ :. The decoder network converts the encoding into a sentence in the target language by defining the conditional probability p(ω(t) |ω(s) )

Objectives

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Automated Software Engineering	Publication Date: Mar 5, 2022
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

Can we generate shellcodes via natural language? An empirical study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Automated Software Engineering

Lead the way for us

Similar Papers

Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators
Pietro Liguori ... Roberto Natella
Expert Systems with Applications | VOL. 225
Pietro Liguori, et. al.Pietro Liguori ... Roberto Natella
01 Sep 2023
Expert Systems with Applications | VOL. 225

Can NMT understand me?
Pietro Liguori ... Simona De Vivo
-
Pietro Liguori, et. al.Pietro Liguori ... Simona De Vivo
21 May 2022
21 May 2022

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

EVIL: Exploiting Software via Natural Language
Pietro Liguori ... Samira Shaikh
-
Pietro Liguori, et. al.Pietro Liguori ... Samira Shaikh
01 Oct 2021
01 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Can we generate shellcodes via natural language? An empirical study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Automated Software Engineering