Abstract

Text parsing has always benefited from special attention since the first applications of natural language processing (NLP). The problem gets worse for the Arabic language because of its specific features that make it quite different and even more ambiguous than other natural languageswhen processed. In this paper, we discuss a new approach for chunking Arabic texts based on a combinatorial classification process. It is a modular chunker that identifies the chunkheads using a combinatorial binary classification before recognizing their types based on the parts -of-speech of the chunk heads, already identified. For the experimentation, we use over than 2300 wordsas training data. The evaluation of the chunker consists of two steps and gives results that we consider very satisfactory (average accuracy of 89,60% for the classification step and 80,46% for the full chunking process).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.