UTILIZING RETRIEVAL-AUGMENTED GENERATION IN LARGE LANGUAGE MODELS TO ENHANCE INDONESIAN LANGUAGE NLP

Herdian Tohir,Nita Merlina,Muhammad Haris

doi:10.33480/jitk.v10i2.5916

Herdian Tohir, Nita Merlina + Show 1 more

Open Access

https://doi.org/10.33480/jitk.v10i2.5916

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The improvement of Large Language Models (LLM) such as ChatGPT through Retrieval-Augmented Generation (RAG) techniques has urgency in the development of natural language translation technology and dialogue systems. LLMs often experience obstacles in addressing special requests that require information outside the training data. This study aims to discuss the use of Retrieval-Augmented Generation (RAG) on large-scale language models to improve the performance of Natural Language Processing (NLP) in Indonesian, which has so far been poorly supported by high-quality data and to overcome the limitations of traditional language models in understanding the context of Indonesian better. The method used is a combination of retrieval capabilities (external information search) with generation (text generation), where the model utilizes broader and more structured basic data through the retrieval process to produce more accurate and relevant text. The data used includes the Indonesian corpus of the 30 Juz Quran translation into Indonesian. The results of the trial show that the RAG approach significantly improves the performance of the model in various NLP tasks, including token usage optimization, text classification, and context understanding, by increasing the accuracy and relevance of the results

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

UTILIZING RETRIEVAL-AUGMENTED GENERATION IN LARGE LANGUAGE MODELS TO ENHANCE INDONESIAN LANGUAGE NLP

Abstract

Published Version

Talk to us

Similar Papers

More From: JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer)

Lead the way for us

Journal: JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer)	Publication Date: Nov 19, 2024
License type: CC BY-NC 4.0

Similar Papers

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang ... Qizhang Feng
ACM Transactions on Knowledge Discovery from Data | VOL. 18
Jingfeng Yang, et. al.Jingfeng Yang ... Qizhang Feng
26 Apr 2024
ACM Transactions on Knowledge Discovery from Data | VOL. 18

Keynote - AI for the Public Sector and the Case of Legal NLP
Matthias Stürmer
-
Matthias StürmerMatthias Stürmer
03 Apr 2023
03 Apr 2023

A Large and Diverse Arabic Corpus for Language Modeling
Abbas Raza Ali ... Hasan Raza Ali
Procedia Computer Science | VOL. 225
Abbas Raza Ali, et. al.Abbas Raza Ali ... Hasan Raza Ali
01 Jan 2023
Procedia Computer Science | VOL. 225

Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.
Ling Luo ... Jinzhong Ning
Journal of the American Medical Informatics Association : JAMIA | VOL. 31
Ling Luo, et. al.Ling Luo ... Jinzhong Ning
29 Feb 2024
Journal of the American Medical Informatics Association : JAMIA | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

UTILIZING RETRIEVAL-AUGMENTED GENERATION IN LARGE LANGUAGE MODELS TO ENHANCE INDONESIAN LANGUAGE NLP

Abstract

Published Version

Talk to us

Similar Papers

More From: JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer)