Abstract

Error recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion. Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a restricted form of backtracking. Labeled failures are a conservative extension of PEGs that adds an error reporting mechanism for PEG parsers, and these labels can also be associated with recovery expressions to provide an error recovery mechanism. These expressions can use the full expressivity of PEGs to recover from syntactic errors. Manually annotating a large grammar with labels and recovery expressions can be difficult. In this work, we present two approaches, Standard and Unique , to automatically annotate a PEG with labels, and to build their corresponding recovery expressions. The Standard approach annotates a grammar in a way similar to manual annotation, but it may insert labels incorrectly, while the Unique approach is more conservative to annotate a grammar and does not insert labels incorrectly. We evaluate both approaches by using them to generate error recovering parsers for four programming languages: Titan, C, Pascal and Java. In our evaluation, the parsers produced using the Standard approach, after a manual intervention to remove the labels incorrectly added, gave an acceptable recovery for at least 70% of the files in each language. By it turn, the acceptable recovery rate of the parsers produced via the Unique approach, without the need of manual intervention, ranged from 41% to 76%. • We discuss two approaches, Standard and Unique, to build PEG-based error recovering parsers in a more automatic way. • We build error recovering parsers for Titan, C, Pascal and Java. • Algorithm Standard, with the help of manual intervention, gives an acceptable recovery for at least 70% of the syntactically invalid files of each language. • Algorithm Unique, without manual intervention, gives an acceptable recovery rate that ranges from 41% to 76%.

Highlights

  • Integrated Development Environments (IDEs) often require parsers that can recover from syntax errors and build syntax trees even for syntactically invalid programs, in other to conduct further analyses necessary for IDE features such as automated refactoring and code completion

  • Parsing Expression Grammars (PEGs) [1] are a formalism used to describe the syntax of programming languages, as an alternative for Context-Free Grammars (CFGs)

  • The remainder of this paper is organized as follows: Section 2 discusses error recovery in PEGs using labeled failures and recovery expressions; Section 3 shows Algorithm Standard, which automatically annotates a PEG with labels and associates a recovery expression to each label; Section 4 evaluates the use of Algorithm Standard to annotate the grammars of four programming languages: Titan, C, Pascal, and Java; Section 5 discusses conservative approaches to insert labels and presents Algorithm Unique, which inserts only correct labels; Section 6 compares the use of both algorithms to annotate Titan, C, Pascal and Java grammars; Section 7 discusses related work on error reporting and error recovery; Section 8 gives some concluding remarks

Read more

Summary

Introduction

Integrated Development Environments (IDEs) often require parsers that can recover from syntax errors and build syntax trees even for syntactically invalid programs, in other to conduct further analyses necessary for IDE features such as automated refactoring and code completion. This paper extends the previous one by evaluating the use of Algorithm Standard to build error recovering parsers for C, Pascal and Java. The remainder of this paper is organized as follows: Section 2 discusses error recovery in PEGs using labeled failures and recovery expressions; Section 3 shows Algorithm Standard, which automatically annotates a PEG with labels and associates a recovery expression to each label; Section 4 evaluates the use of Algorithm Standard to annotate the grammars of four programming languages: Titan, C, Pascal, and Java; Section 5 discusses conservative approaches to insert labels and presents Algorithm Unique, which inserts only correct labels; Section 6 compares the use of both algorithms to annotate Titan, C, Pascal and Java grammars; Section 7 discusses related work on error reporting and error recovery; Section 8 gives some concluding remarks

Error Recovery in PEGs with Labeled Failures
Automatic Insertion of Labels and Recovery Expressions
Evaluating Algorithm Standard
Pascal
Conservative Insertion of Labels
Evaluating the Conservative Insertion of Labels
Related Work
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call