Abstract

This paper presents a methodology for rule based bottom up parsing technique forModern Standard Arabic (MSA) inContext Free Grammar (CFG) formalism in Phrase Structure Grammar (PSG) representation, where the grammar isautomatically extracted from a syntactically annotated corpus.The extracted grammar is used to build an automatic lexicon andgrammar rules module. Furthermore, the extracted CFG is further transformed into Probabilistic Context Free Grammar (PCFG)that could be used in a hybrid approach, which is also calculated automatically. The used corpus is the Penn ArabicTreebank(PATB)and algorithm implementation is performed with Natural Language Processing Toolkit (NLTK).The parsershowed that automatic extraction of grammar improved the grammar building phase in both coverage of structures and timeneeded, but still needs further manual constrains addition. Automatic extraction of grammar is able to enhance rule basedgrammar parsers and it will enable a new paradigm of statistically directed symbolic parsing.

Highlights

  • Parsing is responsible of determining the syntactic structure of an expression

  • Syntactic analysis process have been defined as ―the process of analyzing a sequence of tokens to determine its grammatical structure with respect to a given formal grammar

  • Parsing is used to refer to the process of building automatically syntactic analysis of sentences according to a given grammar [8].The parsing transforms input text into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input, where different grammatical frameworks have been proposed [2]

Read more

Summary

INTRODUCTION

Parsing is responsible of determining the syntactic structure of an expression. Syntactic parsing is a vital step in any Natural Language Processing (NLP) application. This paper presents an automatic extraction technique for automatic building of lexicon and grammatical rules to be used in a symbolic rule based parser. A simple question regarding the sequence of the grammatical rules, ex: what type of phrases should be parsed (identified) first prepositional phrase or noun phrase, is usually answered logically, prepositional phrase as they have less structures diversity. This type of decision and many others should be judged with statistical guidance through grammar extraction technique of grammar. ALANSARY: Modern Standard Arabic Grammar Automatic Extraction from Penn Arabic Treebank

RELATED WORKS:
PARSING APPROACHES:
FORMAL LANGUAGE CFG AND REWRITE RULES:
BASIC SEARCH AND MATCHING STRATEGIES FOR PARSING:
GRAMMER DEVELOPMENT STRATEGIES:
THE PATB CORPUS:
GRAMMAR EXTRACTION AND PARSING
Algorithm
The Training Phase
Calculate PCFG
10 CONCLUSIONS
Findings
REFERENCES:
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call