Bi-directional long short term memory using recurrent neural network for biological entity recognition

Rashmi Siddalingappa,Kanagaraj Sekar

doi:10.11591/ijai.v11.i1.pp89-101

Abstract

<p>Biomedical named entity recognition (NER) aims at identifying medical entities from unstructured data. A quintessential task in the supervision of biological databases is handling biomedical terms such as cancer type, DeoxyriboNucleic and RiboNucleic Acid, gene and protein name, and others. However, due to the massive size of online medical repositories, data processing becomes a challenge for a gazetteer without proper annotation. The traditional NER systems depend on feature engineering that is tedious and time-consuming. The research study presents a new model for Bio-NER using recurrent neural network. Unlike existing approaches, the proposed method uses bidirectional traversing with GloVe vector modelling performed at character and word levels. Bio-NER is performed in three stages; firstly, the relevant medical entities in electronic medical records from PubMed were extracted using the skip-gram model. Secondly, a vector representation for each word is created through the 1-hot method. Thirdly, the weights of the recurrent neural network (RNN) layers are adjusted using backward propagation. Finally, the long-short-term memory cells store the previously encountered medical entity to tackle context-dependency. The accuracy and F-score are calculated for each medical entity type. The MacroR, MacroP, and MacroF are equal to 0.86, 0.88, and 0.87. The overall accuracy achieved was 94%.</p>

Full Text