Abstract

Named-entity recognition (NER) is the application of information extraction by artificial intelligence (AI) to locate and classify conceptual entities from natural language into pre-defined categories. In this study, we apply the Long Short-Term Memory network (LSTM) networks to identify the patient entities from the Enterprise Master Patient Index (EMPI). A sample dataset with 300,000 deidentified patient records is used to test the LSTM performance for EMPI entity recognition. The data entries are firstly converted into strings and represented by a Word2Vec model with 200 dimensions. Two LSTM models are developed for the NER recognition problem. The first LSTM model uses a multi-classifier with a softmax function, the second LSTM model uses a two-step classification procedure by binary logistic function. To evaluate the LSTM performance, we use a conventional deep neural network model for comparison, where the Levenshtein distance is used to represent the training data patterns. The classification performance is evaluated by ten-fold cross-validation. The two-step LSTM model has the classification accuracy of 99.82%, which is superior to both the multi-classification LSTM classifier at 61.08% and to the conventional deep neural network at 95.08%. Therefore, we conclude that the new two-step LSTM model provides an accurate and reliable solution to recognize the EMPI patient entities when it is properly configured and trained.

Highlights

  • EMPI is the acronym of Enterprise Master Patient Index

  • An EMPI dataset with 300,000 deidentified patient entity records is acquired from Dapasoft INC., an Ontario government contractor for the maintenance and integration of the Ontario EHR systems

  • The average classification accuracy is 33.33% in the cross-validation, which implies the Long Short-Term Memory network (LSTM) classifier directly trained by character sequence cannot recognize the patient entities from the EMPI data

Read more

Summary

Introduction

EMPI is the acronym of Enterprise Master Patient Index. It is known as Master Patient Index or MPI. MPI matching plays an important role in the multi-database and cross-system integration in eletronic health systems. A recent study finds that the most frequent mismatches include missing data and misspelling in record information. A study in South Korea states that a highquality EMPI database system can improve the performance of a health information exchange (HIE) for both general purposes and specific purposes [2]. The semantic patterns related to the patients in the EMPI system can be learned and presented by appropriate machine learning models. A properly trained machine learning model can classify the patient entities during matching in multi-source electronic health data (EHR) integration

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.