Learning lenient parsing &amp; typing via indirect supervision

Toufique Ahmed,Premkumar Devanbu,Vincent J Hellendoorn

doi:10.1007/s10664-021-09942-y

Abstract

Both professional coders and teachers frequently deal with imperfect (fragmentary, incomplete, ill-formed) code. Such fragments are common in STACKOVERFLOW; students also frequently produce ill-formed code, for which instructors, TAs (or students themselves) must find repairs. In either case, the developer experience could be greatly improved if such code could somehow be parsed & typed; this makes such code more amenable to use within IDEs and allows early detection and repair of potential errors. We introduce a lenient parser, which can parse & type fragments, even ones with simple errors. Training a machine learner to leniently parse and type imperfect code requires a large training set including many pairs of imperfect code and its repair; such training sets are limited by human effort and curation. In this paper, we present a novel, indirectly supervised, approach to train a lenient parser, without access to such human-curated training data. We leverage the huge corpus of mostly correct code available on Github, and the massive, efficient learning capacity of Transformer-based NN architectures. Using GitHub data, we first create a large dataset of fragments of code and corresponding tree fragments and type annotations; we then randomly corrupt the input fragments by seeding errors that mimic corruptions found in STACKOVERFLOW and student data. Using this data, we train high-capacity transformer models to overcome both fragmentation and corruption. With this novel approach, we can achieve reasonable performance on parsing & typing STACKOVERFLOW fragments; we also demonstrate that our approach performs well on shorter student error program and achieves best-in-class performance on longer programs that have more than 400 tokens. We also show that by blending Deepfix and our tool, we could achieve 77% accuracy, which outperforms all previously reported student error correction tools.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning lenient parsing & typing via indirect supervision

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Journal: Empirical Software Engineering	Publication Date: Mar 1, 2021
Citations: 9

Similar Papers

The National Correct Coding Initiative

AAP Pediatric Coding Newsletter | VOL. 1

01 Nov 2005
AAP Pediatric Coding Newsletter | VOL. 1

National Correct Coding Initiative Edits

AAP Pediatric Coding Newsletter | VOL. 2004

01 Jan 2004
AAP Pediatric Coding Newsletter | VOL. 2004

DETECT: Development of Technologies for Early HCC Detection
Jihane N Benhammou
Gastroenterology | VOL. 163
Jihane N BenhammouJihane N Benhammou
23 Mar 2022
Gastroenterology | VOL. 163

CPT coding for the allergist
Gary N Gross
Journal of Allergy and Clinical Immunology | VOL. 117
Gary N GrossGary N Gross
01 Feb 2006
Journal of Allergy and Clinical Immunology | VOL. 117

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning lenient parsing &amp; typing via indirect supervision

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering

Learning lenient parsing & typing via indirect supervision