Abstract

Parsers are used in different software development scenarios such as compiler construction, data format processing, machine-level translation, and natural language processing. Due to the widespread usage of parsers, there exist different tools aimed at automizing their generation. Two of the most common parser generation tools are the classic Lex/Yacc and ANTLR. Even though ANTLR provides more advanced features, Lex/Yacc is still the preferred choice in many university courses. There exist different qualitative comparisons of the features provided by both approaches, but no study evaluates empirical features such as language implementor productivity and tool simplicity, intuitiveness, and maintainability. In this article, we present such an empirical study by conducting an experiment with undergraduate students of a Software Engineering degree. Two random groups of students implement the same language using a different parser generator, and we statistically compare their performance with different measures. Under the context of the academic study conducted, ANTLR has shown significant differences for most of the empirical features measured.

Highlights

  • Parsing— known as syntax or syntactic analysis—is the process of analyzing a string of terminal symbols conforming to the rules of a formal grammar [1]

  • Ever since the creation of the Programming Language Design and Implementation course, the students had implemented their lexical and syntax analyzers with the BYaccJ and JFlex generators. 2020-2021 is the first academic year we introduce the use of ANTLR

  • The empirical comparison undertaken shows that, for the implementation of a programming language of medium complexity by year-3 students of a Software Engineering degree, the ANTLR tool shows significant benefits compared to Lex/Yacc

Read more

Summary

Introduction

Parsing— known as syntax or syntactic analysis—is the process of analyzing a string of terminal symbols conforming to the rules of a formal grammar [1]. Such a grammar may describe a natural language (e.g., English or French), a computer programming language, or even a data format. The process of recognizing the terminal symbols, called tokens, from a sequence of characters is called lexical analysis [2]. Parsers are software components that, using a lexer to recognize the terminal symbols of a given language, analyze an input, check its correct syntax, and build a tree that represents the input program with a hierarchical data structure [1]. Parsers are used for different tasks in computer science such as performing machine-level translation of programs (e.g., (de)compilers, transpilers, and (dis)assemblers), creating software development tools (e.g., profilers, debuggers, and linkers), natural language processing (e.g., dependency and constituency parsing), and data format processing (e.g., JSON, GraphViz DOT, and DNS zone files)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.