Multilingual Named Entity Recognition Model for Indonesian Health Insurance Question Answering System

Budi Sulistiyo Jati,S.T Muhammad Nur Rizal,St Widyawan

doi:10.1109/icoiact50329.2020.9332027

Abstract

Named Entity Recognition (NER) is the task of extracting information to find and classify entities from unstructured text into predetermined categories. In this study, NER is used to find entities of locations, organizations, financial tasks, administrative tasks, and healthcare facilities in chat and public service complaints dataset of Indonesian national health insurance. The method used is Bidirectional Encoder Representation from Transformer (BERT) Multilingual Cased, and BERT Multilingual Uncased models. Pre-processing conducted in this research is tokenization, formalization, and tag distribution analysis. Then it is converted into a BERT input feature consisting of token embedding, positional embedding, and attention mask. Based on the experiment results, BERT Multilingual Uncased model archives total average F1 score 83.52 and BERT Multilingual Cased model archives total average F1 score 85.41. The experiment results prove that BERT Multilingual can be implemented for Indonesian dataset, and also show that the cased model can get a better F1 score.

Full Text