Abstract

Clinical abbreviations pose a challenge for clinical decision support systems due to their ambiguity. Additionally, clinical datasets often suffer from class imbalance, hindering the classification of such data. This imbalance leads to classifiers with low accuracy and high error rates. Traditional feature-engineered models struggle with this task, and class imbalance is a known factor that reduces the performance of neural network techniques. This study proposes an attention-based bidirectional long short-term memory (Bi-LSTM) model to improve clinical abbreviation disambiguation in clinical documents. We aim to address the challenges of limited training data and class imbalance by employing data generation techniques like reverse substitution and data augmentation with synonym substitution. We utilise a Bi-LSTM classification model with an attention mechanism to disambiguate each abbreviation. The model's performance is evaluated based on accuracy for each abbreviation. To address the limitations of imbalanced data, we employ data generation techniques to create a more balanced dataset. The evaluation results demonstrate that our data balancing technique significantly improves the model's accuracy by 2.08%. Furthermore, the proposed attention-based Bi-LSTM model achieves an accuracy of 96.09% on the UMN dataset, outperforming state-of-the-art results. Deep neural network methods, particularly Bi-LSTM, offer promising alternatives to traditional feature-engineered models for clinical abbreviation disambiguation. By employing data generation techniques, we can address the challenges posed by limited-resource and imbalanced clinical datasets. This approach leads to a significant improvement in model accuracy for clinical abbreviation disambiguation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call