Abstract

As a result of globalization and better quality of education, a significant percentage of the population in Arab countries have become bilingual/multilingual. This has raised to the frequency of code-switching and code-mixing among Arabs in daily communication. Consequently, huge amount of Code-Mixed (CM) content can be found on different social media platforms. Such data could be analyzed and used in different Natural Language Processing (NLP) tasks to tackle the challenges emerging due to this multilingual phenomenon. Named Entity Recognition (NER) is one of the major tasks for several NLP systems. It is the process of identifying named entities in text. However, there is a lack of annotated CM data and resources for such task. This work aims at collecting and building the first annotated CM Arabic-English corpus for NER. Furthermore, we constructed a baseline NER system using deep neural networks and word embedding for Arabic-English CM text and enhanced it using a pooling technique.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.