Clinical notes classification system for automated identification of diabetic patients: Hybrid approach integrating rules, information extraction and machine learning

Jonathan Zavala-Díaz,Enrique Reyes-Archundia,J Eduardo Alcaraz-Chávez,Adriana C Téllez-Anguiano,Juan C Olivares-Rojas,José A Gutiérrez-Gnecchi

doi:10.3233/jifs-219375

Abstract

Efficient medical information management is essential in today’s healthcare, significantly to automate diagnoses of chronic diseases. This study focuses on the automated identification of diabetic patients through a clinical note classification system. This innovative approach combines rules, information extraction, and machine learning algorithms to promise greater accuracy and adaptability. Initially, the four algorithms evaluated showed similar performance, with Gradient Boosting standing out with an accuracy of 0.999. They were tested on our clinical and oncology notes, where SVM excelled in correctly labeling non-oncology notes with a 0.99. Gradient Boosting had the best average with 0.966. The combination of rules, information extraction, and Random Forest provided the best average performance, significantly improving the classification of clinical notes and reducing the margin of error in identifying diabetic patients. The principal contribution of this research lies in the pioneering integration of rule-based methods, information extraction techniques, and machine learning algorithms for enhanced accuracy in diabetic patient identification. For future work, we consider implementing these algorithms in natural clinical settings to evaluate their practical performance. Additionally, additional approaches will be explored to improve the accuracy and applicability of clinical note-grading systems in healthcare.

Full Text