Abstract

Natural Language Processing (NLP) techniques are used to glean information from Electronic Health Records (EHR) for identifying patients with unique clinical characteristics and defining phenotypes. The classification of imbalanced datasets is also one of the vital concerns in medical diagnosis. We built an improved framework for automating the multi-class classification of imbalanced medical transcriptions [1] into 40 medical specialties, by creating a set of important phenotypes/features. We implemented and tested five machine learning models out of which Random Forest Classifier has achieved the highest performance of 0.99 F1 score (precision 0.99, recall 0.99) and roc-auc-score of 0.99 on test data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call