Abstract

Most online platforms, applications, and Websites use a massive amount of heterogeneous evolving data. These data must be structured and normalized before integration to improve the search and increase the relevance of results. An ontology can address this critical task by efficiently managing data and providing structured formats through techniques such as the Web Ontology Language (OWL). However, building an ontology can be costly, primarily if conducted manually. In this context, we propose a new methodology for automatically building and learning a multilingual ontology using Arabic as the base language via a corpus collected from Wikipedia. Our proposed methodology relies on Finite-state transducers (FSTs). FSTs are regrouped into a cascade to reduce errors and minimize ambiguity. The produced ontology is extended to English and French and independent language images via a translator we developed using APIs. The rationale for starting with the Arabic corpus to extract terms is that entity linking is more convenient from Arabic to other languages. In addition, many Wikipedia articles in English and French (for instance) do not have associated Arabic articles, but the opposite is true. In addition, dealing with Arabic terms permits us to enrich the Arabic module of the free linguistic platform we use in dictionaries and graphs. To assess the efficiency of our proposed methodology, we conducted performance metrics. The reported results are encouraging and promising.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call