NLP Text Classification for COVID-19 Automatic Detection from Radiology Report in Indonesian Language

Nunung Nurul Qomariyah,Raphael Reynaldi,Dimitar Kazakov,Sri Dhuny Atas Asri,Monique Senjaya,Ardelia Shaula Araminta

doi:10.1109/isriti56927.2022.10053077

Abstract

Radiology is used as an important assessment for patients with pulmonary disease. The radiology images are usually accompanied by a written report from a radiologist to be passed to the other referring physicians. These radiology reports are written in a natural language where they can have different systematic structures based on the language used. In our study, the radiology reports were collected from an Indonesian hospital and written in Bahasa Indonesia. We performed an automatic text classification to differentiate the information written in the radiology reports into two classes, COVID-19 and non COVID-19. To find the best model, we evaluated several embedding techniques available for Bahasa and five Machine Learning (ML) models, namely (1) XGBoost, (2) fastText, (3) LSTM, (4) Bi-LSTM and (5) IndoBERT. The result shows that IndoBERT outperformed the others with an accuracy of 98%. In terms of training speed, the shallow neural network architecture implemented with the fastText library can train the model in under one second and still result in a reasonably good accuracy of 86%.

Full Text