Abstract

While semantic parsing has been an important problem in natural language processing for decades, recent years have seen a wide interest in automatic generation of code from text. We propose an alternative problem to code generation: labelling the algorithmic solution for programming challenges. While this may seem an easier task, we highlight that current deep learning techniques are still far from offering a reliable solution. The contributions of the paper are twofold. First, we propose a large multi-modal dataset of text and code pairs consisting of algorithmic challenges and their solutions, called AlgoLabel. Second, we show that vanilla deep learning solutions need to be greatly improved to solve this task and we propose a dual text-code neural model for detecting the algorithmic solution type for a programming challenge. While the proposed text-code model increases the performance of using the text or code alone, the improvement is rather small highlighting that we require better methods to combine text and code features.

Highlights

  • Recent years have seen an increased interest in semantic parsing, especially due to the advances of data-driven methods using large corpora and deep learning architectures [1,2]

  • We introduce AlgoLabel, a multi-modal text-code dataset that contains both problem statements and C++ code snippets with solutions for the problems

  • The performance of a pre-trained Bidirectional Encoder Representations from Transformers (BERT) base model fine-tuned on our dataset was poor, with a F1 score of 0.40

Read more

Summary

Introduction

Recent years have seen an increased interest in semantic parsing, especially due to the advances of data-driven methods using large corpora and deep learning architectures [1,2]. Code generation requires a more complex representation using a programming language that has a more complex syntax, and a larger number of tokens and very difficult semantics and high level programming constructs. We consider that in order to be able to efficiently generate code from natural language, it is first important to solve some intermediate tasks related to high level programming constructs, such as algorithmic thinking, data structures, and algorithm design techniques. To this extent, a first step is to be able to understand the algorithmic solution required to solve a programming challenge. We define a multi-label classification task using a large set of challenges gathered from several relevant online resources

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.