Fact-checking Vietnamese Information Using Knowledge Graph, Datalog, and KG-BERT

Huong T Duong,Van H Ho,Phuc Do

doi:10.1145/3624557

Abstract

In the era of digital information, ensuring the accuracy and reliability of information is crucial, making fact-checking a vital process. Currently, English fact-checking has thrived due to various language processing tools and ample datasets. However, the same cannot be said for Vietnamese fact-checking, which faces significant challenges due to the lack of such resources. To address these challenges, we propose a model for checking Vietnamese facts by synthesizing three popular technologies: Knowledge Graph (KG), Datalog, and KG-BERT. The KG serves as the foundation for the fact-checking process, containing a dataset of Vietnamese information. Datalog, a logical programming language, is used with inference rules to complete the knowledge within the Vietnamese KG. KG-BERT, a Deep Learning (DL) model, is then trained on this KG to rapidly and accurately classify information that needs fact-checking. Furthermore, to put Vietnamese complex sentences into the fact-checking model, we present a solution for extracting triples from these sentences. This approach also contributes significantly to the ease of constructing foundational datasets for the Vietnamese KG. To evaluate the model's performance, we create a Vietnamese dataset comprising 130,190 samples to populate the KG. Using Datalog, we enrich this graph with additional knowledge. The KG is then utilized to train the KG-BERT model, achieving an impressive accuracy of 95%. Our proposed solution shows great promise for fact-checking Vietnamese information and has the potential to contribute to the development of fact-checking tools and techniques for other languages. Overall, this research makes a significant contribution to the field of data science by providing an accurate solution for fact-checking information in Vietnamese language contexts.

Full Text