Abstract

Information Extraction (IE) systems extract relations from Arabic texts, an unstructured dataset, by identifying the entities and relations in a sentence, which is very useful in many applications like Information Retrieval (IR), and Text Mining. In the Arabic language, we can find a considerable amount of unstructured data, meaning we need a more potent instrument to analyze and retrieve texts. This work uses Deep Learning techniques, specifically the Long Short Term Memory (LSTM), to build the Named Entity Recognition (NER) model using the Trax library, a new deep learning library from Google. The model was trained on an Arabic NER dataset containing 37,000 labeled words. Moreover, the IE model uses an Arabic relation dataset containing 2700 relations. The model was built on the Google-Colab Platform. The information extraction model’s ability from the Arabic texts was checked using a dataset containing 137 sentences. The accuracy according to the F-measure reached 72.56%. The results of this work could play an essential role in Arabic text mining, analyzing Arabic text into more understandable text, or in information retrieval from Arabic texts corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call