Syntax and sensibility: Using language models to detect and correct syntax errors

Eddie Antonio Santos,Jose Nelson Amaral,Joshua Charles Campbell,Abram Hindle,Dhvani Patel

doi:10.1109/saner.2018.8330219

Abstract

Syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of experience that help them quickly resolve these frustrating errors. Standard LR parsers are of little help, typically resolving syntax errors and their precise location poorly. We propose a methodology that locates where syntax errors occur, and suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by using language models trained on correct source code to find tokens that seem out of place. Fixes are synthesized by consulting the language models to determine what tokens are more likely at the estimated error location. We compare n-gram and LSTM (long short-term memory) language models for this task, each trained on a large corpus of Java code collected from GitHub. Unlike prior work, our methodology does not rely that the problem source code comes from the same domain as the training data. We evaluated against a repository of real student mistakes. Our tools are able to find a syntactically-valid fix within its top-2 suggestions, often producing the exact fix that the student used to resolve the error. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Syntax and sensibility: Using language models to detect and correct syntax errors

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Future vector enhanced LSTM language model for LVCSR
Qi Liu ... Yanmin Qian
-
Qi Liu, et. al.Qi Liu ... Yanmin Qian
01 Dec 2017
01 Dec 2017

Source Code Assessment and Classification Based on Estimated Error Probability Using Attentive LSTM Language Model and Its Application in Programming Education
Md Mostafizer Rahman ... Yutaka Watanobe
Applied Sciences | VOL. 10
Md Mostafizer Rahman, et. al.Md Mostafizer Rahman ... Yutaka Watanobe
24 Apr 2020
Applied Sciences | VOL. 10

Binarized LSTM Language Model
Xuan Liu ... Di Cao
-
Xuan Liu, et. al.Xuan Liu ... Di Cao
01 Jan 2018
01 Jan 2018

Modeling Non-Linguistic Contextual Signals in LSTM Language Models Via Domain Adaptation
Min Ma ... Shankar Kumar
-
Min Ma, et. al.Min Ma ... Shankar Kumar
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Syntax and sensibility: Using language models to detect and correct syntax errors

Abstract

Talk to us

Similar Papers