Abstract

This study developed and validated a rule-based classification algorithm for prediabetes risk detection using natural language processing from home care nursing notes. First, we developed prediabetes-related symptomatic terms in English and Korean. Second, we used natural language processing to preprocess the notes. Third, we created a rule-based classification algorithm with 31 484 notes, excluding 315 instances of missing data. The final algorithm was validated by measuring accuracy, precision, recall, and the F1 score against a gold standard testing set (400 notes). The developed terms comprised 11 categories and 1639 words in Korean and 1181 words in English. Using the rule-based classification algorithm, 42.2% of the notes comprised one or more prediabetic symptoms. The algorithm achieved high performance when applied to the gold standard testing set. We proposed a rule-based natural language processing algorithm to optimize the classification of the prediabetes risk group, depending on whether the home care nursing notes contain prediabetes-related symptomatic terms. Tokenization based on white space and the rule-based algorithm were brought into effect to detect the prediabetes symptomatic terms. Applying this algorithm to electronic health records systems will increase the possibility of preventing diabetes onset through early detection of risk groups and provision of tailored intervention.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.