Biomedical-named entity recognition using CUDA accelerated KNN algorithm

Manish Bali,Jude Hemanth Duraisamy,Anandaraj Shanthi Pichandi

doi:10.12928/telkomnika.v21i4.24065

Abstract

Biomedical named entity recognition (Bio-NER) is a highly complex and time-consuming research domain using natural language processing (NLP). It’s widely used in information retrieval, knowledge summarization, biomolecular event extraction, and discovery applications. This paper proposes a method for the recognition and classification of named entities in the biomedical domain using machine learning (ML) techniques. Support vector machine (SVM), decision trees (DT), K-nearest neighbor (KNN), and its kernel versions are used. However, recent advancements in programmable, massively parallel graphics processing units (GPU) hold promise in terms of increased computational capacity at a lower cost to address multi-dimensional data and time complexity. We implement a novel parallel version of KNN by porting the distance computation step on GPU using the compute unified device architecture (CUDA) and compare the performance of all the algorithms using the BioNLP/NLPBA 2004 corpus. Results demonstrate that CUDA-KNN takes full advantage of the GPU’s computational capacity and multi-leveled memory architecture, resulting in a 35× performance enhancement over the central processing unit (CPU). In a comparative study with existing research, the proposed model provides an option for a faster NER system for higher dimensionality and larger datasets as it offers balanced performance in terms of accuracy and speed-up, thus providing critical design insights into developing a robust BioNLP system.

Full Text