AlgoLabel: A Large Dataset for Multi-Label Classification of Algorithmic Challenges

Radu Cristian Alexandru Iacob,Vlad Cristian Monea,Andrei-Florin Ceapă,Dan Rădulescu,Ștefan Trăușan-Matu,Traian Rebedea

doi:10.3390/math8111995

Radu Cristian Alexandru Iacob, Vlad Cristian Monea + Show 4 more

Open Access

https://doi.org/10.3390/math8111995

Copy DOI

Journal: Mathematics	Publication Date: Nov 9, 2020
Citations: 2	License type: CC BY 4.0

Affiliation: Polytechnic University of Bucharest

Abstract

While semantic parsing has been an important problem in natural language processing for decades, recent years have seen a wide interest in automatic generation of code from text. We propose an alternative problem to code generation: labelling the algorithmic solution for programming challenges. While this may seem an easier task, we highlight that current deep learning techniques are still far from offering a reliable solution. The contributions of the paper are twofold. First, we propose a large multi-modal dataset of text and code pairs consisting of algorithmic challenges and their solutions, called AlgoLabel. Second, we show that vanilla deep learning solutions need to be greatly improved to solve this task and we propose a dual text-code neural model for detecting the algorithmic solution type for a programming challenge. While the proposed text-code model increases the performance of using the text or code alone, the improvement is rather small highlighting that we require better methods to combine text and code features.

Highlights

Recent years have seen an increased interest in semantic parsing, especially due to the advances of data-driven methods using large corpora and deep learning architectures [1,2]
We introduce AlgoLabel, a multi-modal text-code dataset that contains both problem statements and C++ code snippets with solutions for the problems
The performance of a pre-trained Bidirectional Encoder Representations from Transformers (BERT) base model fine-tuned on our dataset was poor, with a F1 score of 0.40

Summary

Introduction

Recent years have seen an increased interest in semantic parsing, especially due to the advances of data-driven methods using large corpora and deep learning architectures [1,2]. Code generation requires a more complex representation using a programming language that has a more complex syntax, and a larger number of tokens and very difficult semantics and high level programming constructs. We consider that in order to be able to efficiently generate code from natural language, it is first important to solve some intermediate tasks related to high level programming constructs, such as algorithmic thinking, data structures, and algorithm design techniques. To this extent, a first step is to be able to understand the algorithmic solution required to solve a programming challenge. We define a multi-label classification task using a large set of challenges gathered from several relevant online resources

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AlgoLabel: A Large Dataset for Multi-Label Classification of Algorithmic Challenges

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

An empirical study on POS tagging for Vietnamese social media text
Ngo Xuan Bach ... Tu Minh Phuong
Computer Speech & Language | VOL. 50
Ngo Xuan Bach, et. al.Ngo Xuan Bach ... Tu Minh Phuong
20 Dec 2017
Computer Speech & Language | VOL. 50

Regularized graph convolutional networks for short text classification
...
-
, et. al. ...
07 Dec 2020
07 Dec 2020

Regularized Graph Convolutional Networks for Short Text Classification
Kshitij Tayal ... Karthik Subbian
-
Kshitij Tayal, et. al.Kshitij Tayal ... Karthik Subbian
01 Jan 2020
01 Jan 2020

On Automatic Question Answering Using Efficient Primal-Dual Models
Yusuf Osmanlıoğlu ... Ali Shokoufandeh
-
Yusuf Osmanlıoğlu, et. al.Yusuf Osmanlıoğlu ... Ali Shokoufandeh
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AlgoLabel: A Large Dataset for Multi-Label Classification of Algorithmic Challenges

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics