Abstract

Radiology is used as an important assessment for patients with pulmonary disease. The radiology images are usually accompanied by a written report from a radiologist to be passed to the other referring physicians. These radiology reports are written in a natural language where they can have different systematic structures based on the language used. In our study, the radiology reports were collected from an Indonesian hospital and written in Bahasa Indonesia. We performed an automatic text classification to differentiate the information written in the radiology reports into two classes, COVID-19 and non COVID-19. To find the best model, we evaluated several embedding techniques available for Bahasa and five Machine Learning (ML) models, namely (1) XGBoost, (2) fastText, (3) LSTM, (4) Bi-LSTM and (5) IndoBERT. The result shows that IndoBERT outperformed the others with an accuracy of 98%. In terms of training speed, the shallow neural network architecture implemented with the fastText library can train the model in under one second and still result in a reasonably good accuracy of 86%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.