Abstract

Parsing is a key task in computer science, with applications in compilers, natural language processing, syntactic pattern matching, and formal language theory. With the recent development of deep learning techniques, several artificial intelligence applications, especially in natural language processing, have combined traditional parsing methods with neural networks to drive the search in the parsing space, resulting in hybrid architectures using both symbolic and distributed representations. In this article, we show that existing symbolic parsing algorithms for context-free languages can cross the border and be entirely formulated over distributed representations. To this end, we introduce a version of the traditional Cocke–Younger–Kasami (CYK) algorithm, called distributed (D)-CYK, which is entirely defined over distributed representations. D-CYK uses matrix multiplication on real number matrices of a size independent of the length of the input string. These operations are compatible with recurrent neural networks. Preliminary experiments show that D-CYK approximates the original CYK algorithm. By showing that CYK can be entirely performed on distributed representations, we open the way to the definition of recurrent layer neural networks that can process general context-free languages.

Highlights

  • In computer science, parsing is defined as the process of identifying one or more syntactic structures for an input string, according to some underlying formal grammar specifying the syntax of the reference language

  • General parsing methods have been developed for ambiguous context-free grammar (CFG) that construct compact representations, called parse forests, of all possible parse trees assigned by the underlying grammar to the input string

  • The predominance of symbolic, grammar-based algorithms for parsing of natural language has been successfully challenged by neural networks, which are based on distributed representations

Read more

Summary

Introduction

In computer science, parsing is defined as the process of identifying one or more syntactic structures for an input string, according to some underlying formal grammar specifying the syntax of the reference language. Neural network parsers based on CFGs are hybrid architectures, where a neural network using distributed representations is exploited to drive the search in a space defined according to some symbolic grammar and some discrete data structure, as for instance the parsing table used by the already mentioned CYK algorithm. In these architectures, the high-level, human-readable, symbolic representation is kept separate from the low-level distributed representation. This is the crucial property that allows us to implement the D-CYK algorithm using recurrent neural networks, without the addition of any symbolic data structure

Related Work
CYK Algorithm
Distributed Representations with Holographic Reduced Representations
The CYK Algorithm on Distributed Representations
Encoding the Table P in matrices Pleft and Pright
Encoding and Using Unary Rules
Encoding and Using Binary Rules
Interpreting D-CYK as a Recurrent Neural Network
Building Recurrent Blocks
Decoding Phase
Experiments
Experimental Setup
Results and Discussion
Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.