Abstract
This paper presents a methodology for rule based bottom up parsing technique forModern Standard Arabic (MSA) inContext Free Grammar (CFG) formalism in Phrase Structure Grammar (PSG) representation, where the grammar isautomatically extracted from a syntactically annotated corpus.The extracted grammar is used to build an automatic lexicon andgrammar rules module. Furthermore, the extracted CFG is further transformed into Probabilistic Context Free Grammar (PCFG)that could be used in a hybrid approach, which is also calculated automatically. The used corpus is the Penn ArabicTreebank(PATB)and algorithm implementation is performed with Natural Language Processing Toolkit (NLTK).The parsershowed that automatic extraction of grammar improved the grammar building phase in both coverage of structures and timeneeded, but still needs further manual constrains addition. Automatic extraction of grammar is able to enhance rule basedgrammar parsers and it will enable a new paradigm of statistically directed symbolic parsing.
Highlights
Parsing is responsible of determining the syntactic structure of an expression
Syntactic analysis process have been defined as ―the process of analyzing a sequence of tokens to determine its grammatical structure with respect to a given formal grammar
Parsing is used to refer to the process of building automatically syntactic analysis of sentences according to a given grammar [8].The parsing transforms input text into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input, where different grammatical frameworks have been proposed [2]
Summary
Parsing is responsible of determining the syntactic structure of an expression. Syntactic parsing is a vital step in any Natural Language Processing (NLP) application. This paper presents an automatic extraction technique for automatic building of lexicon and grammatical rules to be used in a symbolic rule based parser. A simple question regarding the sequence of the grammatical rules, ex: what type of phrases should be parsed (identified) first prepositional phrase or noun phrase, is usually answered logically, prepositional phrase as they have less structures diversity. This type of decision and many others should be judged with statistical guidance through grammar extraction technique of grammar. ALANSARY: Modern Standard Arabic Grammar Automatic Extraction from Penn Arabic Treebank
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have