Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

Pietro Liguori,Bojan Cukic,Domenico Cotroneo,Cristina Improta,Roberto Natella

doi:10.1016/j.eswa.2023.120073

Abstract

AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses output similarity metrics, i.e., automatic metrics that compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This work analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Expert Systems with Applications	Publication Date: Sep 1, 2023
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Similar Papers

MarianCG: a code generation transformer model inspired by machine translation
Ahmed S Soliman ... Samir I Shaheen
Journal of Engineering and Applied Science | VOL. 69
Ahmed S Soliman, et. al.Ahmed S Soliman ... Samir I Shaheen
22 Nov 2022
Journal of Engineering and Applied Science | VOL. 69

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Can we generate shellcodes via natural language? An empirical study
Pietro Liguori ... Roberto Natella
Automated Software Engineering | VOL. 29
Pietro Liguori, et. al.Pietro Liguori ... Roberto Natella
05 Mar 2022
Automated Software Engineering | VOL. 29

Evaluation of English–Slovak Neural and Statistical Machine Translation
Lucia Benkova ... Dasa Munkova
Applied Sciences | VOL. 11
Lucia Benkova, et. al.Lucia Benkova ... Dasa Munkova
25 Mar 2021
Applied Sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications