Abstract
Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.
Highlights
IntroductionSoftware security plays a crucial role in our society. Software vendors and users are in an arms race against cybercriminals, investing significant efforts towards identifying vulnerabilities and patching them, sometimes releasing updates mere hours after a release
Nowadays, software security plays a crucial role in our society
We propose a novel approach for translating natural language into shellcode in assembly language, based on neural machine translation (NMT)
Summary
Software security plays a crucial role in our society. Software vendors and users are in an arms race against cybercriminals, investing significant efforts towards identifying vulnerabilities and patching them, sometimes releasing updates mere hours after a release. Code-injection attacks have been drastically increasing with the growth of applications exposed to the Internet (Ray and Ligatti 2012), as shown by statistics from the Common Vulnerabilities and Exposures (CVE) database (CVE 2021). These attacks deliver and run malicious code (payload) on the victims’ machine, in order to give attackers control of the target system. Machine translation refers to the translation of a language into another by the means of a computerized system (Dorr et al 1999) It is defined as an optimization problem, which maximizes the conditional probability that a sentence ω(t) in the target language is the likely translation of a sentence ω(s) in the source language, by using a scoring function ψ :. The decoder network converts the encoding into a sentence in the target language by defining the conditional probability p(ω(t) |ω(s) )
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.