Automatic syntax error reporting and recovery in parsing expression grammars

Sérgio Queiroz De Medeiros,Fabio Mascarenhas,Gilney De Azevedo Alvez Junior

doi:10.1016/j.scico.2019.102373

Sérgio Queiroz De Medeiros, Fabio Mascarenhas + Show 1 more

Open Access

https://doi.org/10.1016/j.scico.2019.102373

Copy DOI

Abstract

Error recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion. Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a restricted form of backtracking. Labeled failures are a conservative extension of PEGs that adds an error reporting mechanism for PEG parsers, and these labels can also be associated with recovery expressions to provide an error recovery mechanism. These expressions can use the full expressivity of PEGs to recover from syntactic errors. Manually annotating a large grammar with labels and recovery expressions can be difficult. In this work, we present two approaches, Standard and Unique , to automatically annotate a PEG with labels, and to build their corresponding recovery expressions. The Standard approach annotates a grammar in a way similar to manual annotation, but it may insert labels incorrectly, while the Unique approach is more conservative to annotate a grammar and does not insert labels incorrectly. We evaluate both approaches by using them to generate error recovering parsers for four programming languages: Titan, C, Pascal and Java. In our evaluation, the parsers produced using the Standard approach, after a manual intervention to remove the labels incorrectly added, gave an acceptable recovery for at least 70% of the files in each language. By it turn, the acceptable recovery rate of the parsers produced via the Unique approach, without the need of manual intervention, ranged from 41% to 76%. • We discuss two approaches, Standard and Unique, to build PEG-based error recovering parsers in a more automatic way. • We build error recovering parsers for Titan, C, Pascal and Java. • Algorithm Standard, with the help of manual intervention, gives an acceptable recovery for at least 70% of the syntactically invalid files of each language. • Algorithm Unique, without manual intervention, gives an acceptable recovery rate that ranges from 41% to 76%.

Highlights

Integrated Development Environments (IDEs) often require parsers that can recover from syntax errors and build syntax trees even for syntactically invalid programs, in other to conduct further analyses necessary for IDE features such as automated refactoring and code completion
Parsing Expression Grammars (PEGs) [1] are a formalism used to describe the syntax of programming languages, as an alternative for Context-Free Grammars (CFGs)
The remainder of this paper is organized as follows: Section 2 discusses error recovery in PEGs using labeled failures and recovery expressions; Section 3 shows Algorithm Standard, which automatically annotates a PEG with labels and associates a recovery expression to each label; Section 4 evaluates the use of Algorithm Standard to annotate the grammars of four programming languages: Titan, C, Pascal, and Java; Section 5 discusses conservative approaches to insert labels and presents Algorithm Unique, which inserts only correct labels; Section 6 compares the use of both algorithms to annotate Titan, C, Pascal and Java grammars; Section 7 discusses related work on error reporting and error recovery; Section 8 gives some concluding remarks

Summary

Introduction

Integrated Development Environments (IDEs) often require parsers that can recover from syntax errors and build syntax trees even for syntactically invalid programs, in other to conduct further analyses necessary for IDE features such as automated refactoring and code completion. This paper extends the previous one by evaluating the use of Algorithm Standard to build error recovering parsers for C, Pascal and Java. The remainder of this paper is organized as follows: Section 2 discusses error recovery in PEGs using labeled failures and recovery expressions; Section 3 shows Algorithm Standard, which automatically annotates a PEG with labels and associates a recovery expression to each label; Section 4 evaluates the use of Algorithm Standard to annotate the grammars of four programming languages: Titan, C, Pascal, and Java; Section 5 discusses conservative approaches to insert labels and presents Algorithm Unique, which inserts only correct labels; Section 6 compares the use of both algorithms to annotate Titan, C, Pascal and Java grammars; Section 7 discusses related work on error reporting and error recovery; Section 8 gives some concluding remarks

Error Recovery in PEGs with Labeled Failures

Automatic Insertion of Labels and Recovery Expressions

Evaluating Algorithm Standard

Pascal

Conservative Insertion of Labels

Evaluating the Conservative Insertion of Labels

Related Work

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Science of Computer Programming	Publication Date: Nov 27, 2019
Citations: 5	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Automatic syntax error reporting and recovery in parsing expression grammars

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Science of Computer Programming

Lead the way for us

Similar Papers

Error recovery in parsing expression grammars through labeled failures and its implementation based on a parsing machine
Sérgio Queiroz De Medeiros ... Fabio Mascarenhas
Journal of Visual Languages & Computing | VOL. 49
Sérgio Queiroz De Medeiros, et. al.Sérgio Queiroz De Medeiros ... Fabio Mascarenhas
16 Oct 2018
Journal of Visual Languages & Computing | VOL. 49

Syntax error recovery in parsing expression grammars
Sérgio Medeiros ... Fabio Mascarenhas
-
Sérgio Medeiros, et. al.Sérgio Medeiros ... Fabio Mascarenhas
09 Apr 2018
09 Apr 2018

The Computational Power of Parsing Expression Grammars
Bruno Loff ... Rogério Reis
-
Bruno Loff, et. al.Bruno Loff ... Rogério Reis
01 Jan 2018
01 Jan 2018

Lexical Parsing Expression Recognition Schemata
Markus Lumpe
-
Markus LumpeMarkus Lumpe
01 Sep 2015
01 Sep 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic syntax error reporting and recovery in parsing expression grammars

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Science of Computer Programming