Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach

Hossam Faris,Manal Alomari,Pedro A Castillo,Maria Habib,Alaa Alomari,Mohammad Faris

doi:10.1007/s12652-021-02948-w

Abstract

Automated question classification is a fundamental component of automated question-answering systems, which plays a critical role in promoting medical and healthcare services. Developing an automated question classification system depends heavily on natural language processing and data mining techniques. Question classification methods based on classical machine learning techniques face limitations in capturing the hidden relationships of features, as well as, handling complex languages and very large-scale datasets. Therefore, this paper proposes a deep learning approach for question classification, since deep learning methods have the powerful capability to extract implicit, hidden relationships and automatically generate dense representations of features. The proposed question classification model depends on unidirectional and bidirectional long short-term memory networks (LSTM and BiLSTM), which essentially developed to handle the Arabic language in the field of healthcare. The features are represented and created using a domain-specific word embedding model (Word2Vec) that is constructed by training around 1.5 million medical consultations from Altibbi company. Altibbi is a telemedicine company that is used as a case study and a source for curating and collecting the data. The proposed deep learning approach is a multi-class classification algorithm that automatically labels and maps the questions into 15 categories of medical specialities. The proposed deep learning model is evaluated using several evaluation metrics, including accuracy, precision, recall, and F1-score. Markedly, the proposed model achieved a superb classification capacity in terms of classification accuracy rate, which gained 87.2%.

Full Text