Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study.

Mina Kim,Soo-Yong Shin,Byoung-Kee Yi,Dong Kyung Chang,Mira Kang

doi:10.2196/14083

Abstract

BackgroundData standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods.ObjectiveWe aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology.MethodsWe developed a method called standardization algorithm for laboratory test–categorical result (SALT-C) that can process categorical laboratory data, such as pos +, 250 4+ (urinalysis results), and reddish (urinalysis color results). SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into 5 predefined groups (urine color, urine dipstick, blood type, presence-finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned.ResultsThe performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (eg, ** and _^), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.6% (123/126; urine color tests), 97.5% (198/203; (urine dipstick tests), 95% (53/56; blood type tests), 99.68% (162,291/162,805; presence-finding tests), and 99.61% (4643/4661; pathogenesis tests).ConclusionsThe proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts.

Highlights

BackgroundAs the volume of digitized medical data generated from real-world clinical settings explosively increases owing to the wide adoption of electronic health records (EHRs), there are mounting expectations that such data offer an opportunity to find high-quality medical evidence and improve health-related decision making and patient outcomes [1,2,3,4,5,6]
There are some value sets publicly available at systematized nomenclature of medicine (SNOMED), logical observation identifiers names and codes (LOINC), and Value Set Authority Center, but these were scattered, requiring an integrated dictionary to identify the spectrum of categorical laboratory data
We developed standardization algorithm for laboratory test–categorical result (SALT-C), an algorithm that supports mapping of categorical laboratory data to the SNOMED-clinical terms (CT), and applied it to a large, long-period EHR system database

Summary

Introduction

BackgroundAs the volume of digitized medical data generated from real-world clinical settings explosively increases owing to the wide adoption of electronic health records (EHRs), there are mounting expectations that such data offer an opportunity to find high-quality medical evidence and improve health-related decision making and patient outcomes [1,2,3,4,5,6]. Interest is growing in conducting multi-institutional studies for earning strength in analysis using EHR data, such as the Observational Health Data Sciences and Informatics [13], National Patient-Centered Clinical Research Network [14], and Electronic Medical Records and Genomics network [15], by standardizing EHR data from multiple institutions [16,17,18,19,20,21]. It is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. Most of the previous efforts have been based on labor-intensive manual methods

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Aug 29, 2019
Citations: 13	License type: cc-by

R Discovery Prime

R Discovery Prime

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

Rapid identification of chronic kidney disease in electronic health record database using computable phenotype combining a common data model.
Huai-Yu Wang ... Yu Yang
Chinese medical journal | VOL. 136
Huai-Yu Wang, et. al.Huai-Yu Wang ... Yu Yang
05 Apr 2023
Chinese medical journal | VOL. 136

Continuity and Completeness of Electronic Health Record Data for Patients Treated With Oral Hypoglycemic Agents: Findings From Healthcare Delivery Systems in Taiwan.
Chien-Ning Hsu ... Kelly Huang
Frontiers in Pharmacology | VOL. 13
Chien-Ning Hsu, et. al.Chien-Ning Hsu ... Kelly Huang
04 Apr 2022
Frontiers in Pharmacology | VOL. 13

Evaluation of a urine dipstick test for confirmation or exclusion of proteinuria in dogs
Andrea Zatelli ... Francesca Nizi
American Journal of Veterinary Research | VOL. 71
Andrea Zatelli, et. al.Andrea Zatelli ... Francesca Nizi
01 Feb 2010
American Journal of Veterinary Research | VOL. 71

Enhancement in line of therapy (LoT) derivation from real-world data (RWD) from electronic health records (EHR) via integration of medical claims data.
Smita Agrawal ... Rohini George
Journal of Clinical Oncology | VOL. 41
Smita Agrawal, et. al.Smita Agrawal ... Rohini George
01 Jun 2023
Journal of Clinical Oncology | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics