Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Incheon Paik,Jun-Wei Wang

doi:10.3390/electronics10212706

Incheon Paik, Jun-Wei Wang

Open Access

https://doi.org/10.3390/electronics10212706

Copy DOI

Abstract

Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.

Highlights

Deep learning has advanced the performance of models for code generation significantly
The current state-of-the-art approach, GraphCodeBERT, trained CodeXGLUE data on bidirectional encoder representations from transformers (BERT) together with a code graph of which semantic data flow was extracted from the data set
We present GrapchCodeGPT that uses data flow from code

Summary

Introduction

Deep learning has advanced the performance of models for code generation significantly. There have been many approaches to improve text-to-code generation performance such as the sequence-totree model with attention [1], dual task of code summarization [2], and pre-training code representations with data flow (GraphCodeBERT) [3]. Based on the different arrangements to use the basic ontological concept of data flow, we expect that GPT-2 produces different effects and we will observe them from experiments. We believe that the effect of these two methods on code generation can improve the performance We carry out this experiment in two phases. A model is pre-trained from scratch and uses the code graph for training based on stage one. Using this model, we investigate the performance of the pre-training and fine-tuned models. The contributions of this paper are: (1) We propose a pre-trained and finetuned model of text and code with several features of variables, and a pre-trained model from scratch based on GPT-2. (2) The detailed variable features with basic ontological concept of data flow are trained and their effect is observed. (3) Effects of the several variable conditions are evaluated

Related Work

Code Generation by LSTM

Code Generation by Transformer

Data Set Description

CCooddee GGeneration

Data Flow

GraphCodeGPT

Pre-Training Code Graph from Scratch

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Nov 5, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Revolutionising Translation Technology: A Comparative Study of Variant Transformer Models - BERT, GPT, and T5
Zaki Muhammad Zayyanu
Computer Science & Engineering: An International Journal | VOL. 14
Zaki Muhammad ZayyanuZaki Muhammad Zayyanu
28 Jun 2024
Computer Science & Engineering: An International Journal | VOL. 14

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

GBERT: A hybrid deep learning model based on GPT-BERT for fake news detection
Pummy Dhiman ... Ghulam Muhammad
Heliyon | VOL. 10
Pummy Dhiman, et. al.Pummy Dhiman ... Ghulam Muhammad
01 Aug 2024
Heliyon | VOL. 10

Identification of social network automated hate speech using GLTR with BERT and GPT-2 : A novel approach
Monika Chhikara ... Vanita Jain
Journal of Information and Optimization Sciences | VOL. 45
Monika Chhikara, et. al.Monika Chhikara ... Vanita Jain
01 Jan 2024
Journal of Information and Optimization Sciences | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics