Abstract

How to correctly recognize and extract named entities, such as disease names, medical measurements and therapies, from online medical diagnosis data remains challenging. For the one hand, conventional natural language processing methods cannot be directly applied in the field of online medical diagnosis (OMD). Although existing supervised or unsupervised learning algorithms have offered strategies for OMD on open websites, such methods might extensively rely on specific knowledge sources or manually designed features. For the other hand, due to the large size of the data and the sophistication of the data structure, it is difficult to establish a robust NLP model for recognition and extraction of clinical named entities in paragraphs. Therefore, in the paper, we try to establish a new deep neural network (DNN), combine the bi-directional long short-term memory (Bi-LSTM) and the conditional random field (CRF), and utilize online medical diagnosis data for recognition and extraction of clinical named entities. Different from existing artificial neural networks (ANN), the proposed neural network well functions even without manual rules or features. The word representations input into the DNN is connected by character-based representations and continuous bags of word clusters (CBOWC) in an embedding way. We test the new DNN on different online medical diagnosis datasets obtained from a scalable web crawler, and compare it with the long short-term memory (LSTM) neural network linguistic model, the convolutional neural network (CNN) model and the most advanced Bi-LSTM-CRF. The fundamentals for the method selection is that they have been suggested to generate acceptable results. According to the comparison analyses, the DNN is proved to be a reliable tool and improves every benchmark performance of OMD.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.