Lyra: A Benchmark for Turducken-Style Code Generation

Qingyuan Liang,Qihao Zhu,Yingfei Xiong,Lian Yu,Wenjie Zhang,Lu Zhang,Zeyu Sun

doi:10.24963/ijcai.2022/588

Abstract

Recently, neural techniques have been used to generate source code automatically. While promising for declarative languages, these approaches achieve much poorer performance on datasets for imperative languages. Since a declarative language is typically embedded in an imperative language (i.e., the turducken-style programming) in real-world software development, the promising results on declarative languages can hardly lead to significant reduction of manual software development efforts. In this paper, we define a new code generation task: given a natural language comment, this task aims to generate a program in a base imperative language with an embedded declarative language. To our knowledge, this is the first turducken-style code generation task. For this task, we present Lyra: a dataset in Python with embedded SQL. This dataset contains 2,000 carefully annotated database manipulation programs from real usage projects. Each program is paired with both a Chinese comment and an English comment. In our experiment, we adopted Transformer, BERT-style, and GPT-style models as baselines. In the best setting, GPT-style model can achieve 24% and 25.5% AST exact matching accuracy using Chinese and English comments, respectively. Therefore, we believe that Lyra provides a new challenge for code generation. Yet, overcoming this challenge may significantly boost the applicability of code generation techniques for real-world software development.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Lyra: A Benchmark for Turducken-Style Code Generation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Unit Test Generation Using Large Language Models: A Systematic Literature Review
Dovydas Marius Zapkus ... Asta Slotkienė
Vilnius University Open Series | VOL. -
Dovydas Marius Zapkus, et. al.Dovydas Marius Zapkus ... Asta Slotkienė
13 May 2024
Vilnius University Open Series | VOL. -

Code Generation by Example Using Symbolic Machine Learning
Kevin Lano ... Qiaomu Xue
SN Computer Science | VOL. 4
Kevin Lano, et. al.Kevin Lano ... Qiaomu Xue
17 Jan 2023
SN Computer Science | VOL. 4

Self-collaboration Code Generation via ChatGPT
Yihong Dong ... Xue Jiang
ACM Transactions on Software Engineering and Methodology | VOL. -
Yihong Dong, et. al.Yihong Dong ... Xue Jiang
12 Jun 2024
ACM Transactions on Software Engineering and Methodology | VOL. -

Code Generation from UML Model: State of the Art and Practical Implications
Andrejs Bajovs ... Oksana Nikiforova
Applied Computer Systems | VOL. 14
Andrejs Bajovs, et. al.Andrejs Bajovs ... Oksana Nikiforova
01 Jun 2013
Applied Computer Systems | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lyra: A Benchmark for Turducken-Style Code Generation

Abstract

Talk to us

Similar Papers