Abstract

Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.

Highlights

  • Deep learning has advanced the performance of models for code generation significantly

  • The current state-of-the-art approach, GraphCodeBERT, trained CodeXGLUE data on bidirectional encoder representations from transformers (BERT) together with a code graph of which semantic data flow was extracted from the data set

  • We present GrapchCodeGPT that uses data flow from code

Read more

Summary

Introduction

Deep learning has advanced the performance of models for code generation significantly. There have been many approaches to improve text-to-code generation performance such as the sequence-totree model with attention [1], dual task of code summarization [2], and pre-training code representations with data flow (GraphCodeBERT) [3]. Based on the different arrangements to use the basic ontological concept of data flow, we expect that GPT-2 produces different effects and we will observe them from experiments. We believe that the effect of these two methods on code generation can improve the performance We carry out this experiment in two phases. A model is pre-trained from scratch and uses the code graph for training based on stage one. Using this model, we investigate the performance of the pre-training and fine-tuned models. The contributions of this paper are: (1) We propose a pre-trained and finetuned model of text and code with several features of variables, and a pre-trained model from scratch based on GPT-2. (2) The detailed variable features with basic ontological concept of data flow are trained and their effect is observed. (3) Effects of the several variable conditions are evaluated

Related Work
Code Generation by LSTM
Code Generation by Transformer
Data Set Description
CCooddee GGeneration
Data Flow
GraphCodeGPT
Pre-Training Code Graph from Scratch
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.