Automated Generation of Human-readable Natural Arabic Text from RDF Data

Roudy Touma,Khaled Shaban,Hazem Hajj,Wassim El-Hajj

doi:10.1145/3582262

Abstract

With the advances in Natural Language Processing (NLP), the industry has been moving towards human-directed artificial intelligence (AI) solutions. Recently, chatbots and automated news generation have captured a lot of attention. The goal is to automatically generate readable text from tabular data or web data commonly represented in Resource Description Framework (RDF) format. The problem can then be formulated as Data-to-text (D2T) generation from structured non-linguistic data into human-readable natural language. Despite the significant work done for the English language, no efforts are being directed towards low-resource languages like the Arabic language. This work promotes the development of the first RDF data-to-text (D2T) generation system for the Arabic language while trying to address the low-resource limitation. We develop several models for the Arabic D2T task using transfer learning from large language models (LLM) such as AraBERT, AraGPT2, and mT5. These models include a baseline Bi-LSTM Sequence-to-Sequence (Seq2Seq) model, as well as encoder-decoder transformers like BERT2BERT, BERT2GPT, and T5. We then provide a detailed comparative study highlighting the strengths and limitations of these methods setting the stage for further advancement in the field. We also introduce a new Arabic dataset (AraWebNLG) that can be used for new model development in the field. To ensure a comprehensive evaluation, general-purpose automated metrics (BLEU and Perplexity scores) are used as well as task-specific human evaluation metrics related to the accuracy of the content selection and fluency of the generated text. The results highlight the importance of pre-training on a large corpus of Arabic data and show that transfer learning from AraBERT gives the best performance. Text-to-text pre-training using mT5 achieves second best performance results even with multilingual weights.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Generation of Human-readable Natural Arabic Text from RDF Data

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Similar Papers

A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages
Panteleimon Krasadakis ... Evangelos Sakkopoulos
Electronics | VOL. 13
Panteleimon Krasadakis, et. al.Panteleimon Krasadakis ... Evangelos Sakkopoulos
04 Feb 2024
Electronics | VOL. 13

Unlocking the Potential of Free Text in Electronic Health Records with Large Language Models (LLM): Enhancing Patient Safety and Consultation Interactions.
Pushpa Kumarapeli ... Simon De Lusignan
Studies in health technology and informatics | VOL. 316
Pushpa Kumarapeli, et. al.Pushpa Kumarapeli ... Simon De Lusignan
22 Aug 2024
Studies in health technology and informatics | VOL. 316

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.
Satya S Sahoo ... Yanshan Wang
Journal of the American Medical Informatics Association : JAMIA | VOL. 31
Satya S Sahoo, et. al.Satya S Sahoo ... Yanshan Wang
24 Apr 2024
Journal of the American Medical Informatics Association : JAMIA | VOL. 31

The Nexus of AI and Vector Databases: Revolutionizing NLP with LLMs
Nazeer Shaik
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08
Nazeer ShaikNazeer Shaik
14 Jun 2024
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Generation of Human-readable Natural Arabic Text from RDF Data

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing