Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb

Diash Firdaus,Yuni Kulsum,Idi Sumardi

doi:10.14421/jiska.2024.9.3.230-243

Abstract

Large Language Models (LLMs) are advanced artificial intelligence systems that use deep learning, particularly transformer architectures, to process and generate text. One such model, Mistral 7b, featuring 7 billion parameters, is optimized for high performance and efficiency in natural language processing tasks. It outperforms similar models, such as LLaMa2 7b and LLaMa 1, across various benchmarks, especially in reasoning, mathematics, and coding. LLMs have also demonstrated significant advancements in addressing medical queries. This research leverages Indonesia’s rich biodiversity, which includes approximately 9,600 medicinal plant species out of the 30,000 known species. The study is motivated by the observation that LLMs, like ChatGPT and Gemini, often rely on internet data of uncertain validity and frequently provide generic answers without mentioning specific herbal plants found in Indonesia. To address this, the dataset for pre-training the model is derived from academic journals focusing on Indonesian medicinal herbal plants. The research process involves collecting these journals, preprocessing them using Langchain, embedding models with sentence transformers, and employing Faiss CPU for efficient searching and similarity matching. Subsequently, the Retrieval-Augmented Generation (RAG) process is applied to Mistral 7b, allowing it to provide accurate, dataset-driven responses to user queries. The model's performance is evaluated using both human evaluation and ROUGE metrics, which assess recall, precision, F1 measure, and METEOR scores. The results show that the RAG Mistral 7b model achieved a METEOR score of 0.22%, outperforming the LLaMa2 7b model, which scored 0.14%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb

Abstract

Talk to us

Similar Papers

More From: JISKA (Jurnal Informatika Sunan Kalijaga)

Lead the way for us

Journal: JISKA (Jurnal Informatika Sunan Kalijaga)	Publication Date: Sep 25, 2024
License type: CC BY-NC 4.0

Similar Papers

A Large and Diverse Arabic Corpus for Language Modeling
Abbas Raza Ali ... Hasan Raza Ali
Procedia Computer Science | VOL. 225
Abbas Raza Ali, et. al.Abbas Raza Ali ... Hasan Raza Ali
01 Jan 2023
Procedia Computer Science | VOL. 225

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang ... Ruixiang Tang
ACM Transactions on Knowledge Discovery from Data | VOL. 18
Jingfeng Yang, et. al.Jingfeng Yang ... Ruixiang Tang
26 Apr 2024
ACM Transactions on Knowledge Discovery from Data | VOL. 18

Use of SNOMED CT in Large Language Models: Scoping Review.
Eunsuk Chang ... Sumi Sung
JMIR medical informatics | VOL. 12
Eunsuk Chang, et. al.Eunsuk Chang ... Sumi Sung
07 Oct 2024
JMIR medical informatics | VOL. 12

A Bibliometric Review of Large Language Models Research from 2017 to 2023
Lizhou Fan ... Sanggyu Lee
ACM Transactions on Intelligent Systems and Technology | VOL. 15
Lizhou Fan, et. al.Lizhou Fan ... Sanggyu Lee
21 Oct 2024
A Bibliometric Review of Large Language Models Research from 2017 to 2023
Lizhou Fan ... Sanggyu Lee

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb

Abstract

Talk to us

Similar Papers

More From: JISKA (Jurnal Informatika Sunan Kalijaga)