Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies.

Chen Yang,Changqing Yin,Yan Liu

doi:10.3390/e23091174

Abstract

Source Code Generation (SCG) is a prevalent research field in the automation software engineering sector that maps specific descriptions to various sorts of executable code. Along with the numerous intensive studies, diverse SCG types that integrate different scenarios and contexts continue to emerge. As the ultimate purpose of SCG, Natural Language-based Source Code Generation (NLSCG) is growing into an attractive and challenging field, as the expressibility and extremely high abstraction of the input end. The booming large-scale dataset generated by open-source code repositories and Q&A resources, the innovation of machine learning algorithms, and the development of computing capacity make the NLSCG field promising and give more opportunities to the model implementation and perfection. Besides, we observed an increasing interest stream of NLSCG relevant studies recently, presenting quite various technical schools. However, many studies are bound to specific datasets with customization issues, producing occasional successful solutions with tentative technical methods. There is no systematic study to explore and promote the further development of this field. We carried out a systematic literature survey and tool research to find potential improvement directions. First, we position the role of NLSCG among various SCG genres, and specify the generation context empirically via software development domain knowledge and programming experiences; second, we explore the selected studies collected by a thoughtfully designed snowballing process, clarify the NLSCG field and understand the NLSCG problem, which lays a foundation for our subsequent investigation. Third, we model the research problems from technical focus and adaptive challenges, and elaborate insights gained from the NLSCG research backlog. Finally, we summarize the latest technology landscape over the transformation model and depict the critical tactics used in the essential components and their correlations. This research addresses the challenges of bridging the gap between natural language processing and source code analytics, outlines different dimensions of NLSCG research concerns and technical utilities, and shows a bounded technical context of NLSCG to facilitate more future studies in this promising area.

Highlights

As the cost of data ingestion, storage and computation continues to decrease, applyingAI in practice is becoming the focus of the whole IT industry
The challenges confronted by natural language-based source code generation (NLSCG) is foreseeable; we tentatively propose the definition of NLSCG as follows: Given a specific problem context C (C can not be specified in the code snippet generation and program synthesis task types, refer to Section 4.3) and a natural language description NL to C, natural language-based source code generation (NLSCG) converts the input NL into the output executable source code (SC) corresponding to that NL
Source Code Generation (SCG) has been studied for a long time, and the NLSCG shows its popularity with the increasing adaptation maturity of deep learning techniques

Summary

Introduction

As the cost of data ingestion, storage and computation continues to decrease, applying. It is urgent to clarify the current development status of NLSCG, namely, the appropriate dataset, the essential algorithms and representative architecture of the transformation model, the living bottlenecks, the enabling factors, and the potential perfection directions Based on these motivations, we investigate the actualities and future trends, summarize representative datasets and tasks, Entropy 2021, 23, 1174 and gain insights from the research backlog of NLSCG. NLSCG processing techniques can be applied to commercial automated software platforms, enabling us to establish natural language interfaces with executable source code, which can reduce the cost of learning and the training cost to a certain extend Research in this field shows its significance, as the learned experiences would be applied to similar scenarios that transform from abstract and fuzzy descriptions to highly structural constrained representations.

Research Journey and Context

Source Code Generation Genres

Problem Understanding Process

Snowballing Process

Studies Analysis

Natural Language-Based Intelligent Code Generation Tasks

Source Code Generation Relevant Datasets

Insights Gained from NLSCG Research Backlog

Language Characteristics

Source Code

Natural Language

Asymmetries between Natural Language and Source Code

Target Concerns

Portability

Generalizability

Accuracy

Spuriousness

Training

Dataset

Training Features

Prior Knowledge

Context

Ineffective Domain Knowledge

Limited Context

Perspectives on NLSCG Latest Technology Landscape

Findings

Conclusions and Future Directions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Sep 7, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Text2PyCode: Machine Translation of Natural Language Intent to Python Source Code
Sridevi Bonthu ... S Rama Sree
-
Sridevi Bonthu, et. al.Sridevi Bonthu ... S Rama Sree
01 Jan 2020
01 Jan 2020

A Deep Learning Model for Source Code Generation
Raymond Tiwang ... Timothy Oladunni
-
Raymond Tiwang, et. al.Raymond Tiwang ... Timothy Oladunni
01 Apr 2019
01 Apr 2019

Investigating Eye Movements in Natural Language and C++ Source Code - A Replication Experiment
Patrick Peachock ... Nicholas Iovino
-
Patrick Peachock, et. al.Patrick Peachock ... Nicholas Iovino
01 Jan 2017
01 Jan 2017

Ontological Engineering For Source Code Generation
Anas Hamid Alokla ... Mustafa M Aref
Future Computing and Informatics Journal | VOL. 4
Anas Hamid Alokla, et. al.Anas Hamid Alokla ... Mustafa M Aref
29 Sep 2020
Future Computing and Informatics Journal | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy