Evaluating global and local sequence alignment methods for comparing patient medical records

Ming Huang,Lixia Yao,Nilay D. Shah

doi:10.1186/s12911-019-0965-y

Ming Huang, Lixia Yao + Show 1 more

Open Access

https://doi.org/10.1186/s12911-019-0965-y

Copy DOI

Journal: BMC Medical Informatics and Decision Making	Publication Date: Dec 1, 2019
Citations: 10	License type: open-access

Affiliation: Mayo Clinic

Abstract

BackgroundSequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients.MethodsWe tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms.ResultsFor global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases.ConclusionsDTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.

Highlights

Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity
We found that the similarity scores of dynamic time warping (DTW) alignments were as good as, or even better than those of reference alignments
DTW alignments were better than Needleman-Wunsch algorithm (NWA) alignments on 46 cases out of 80, with the rest 34 cases having the equal similarity scores from both algorithms

Summary

Introduction

Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. When calculating and comparing patient similarity from electronic health records (EHRs) data, we could not bypass the issue of aligning the temporal event sequences [7]. Che et al for first time deployed dynamic time warping (DTW) to align temporal sequence when calculating patient similarity. They adopted a linear regression model with a subset of patients that are most similar to a target patient and achieved a better F1 score (77%) at predicting the target patient’s Parkinson subtype, compared to the same model using all patients (75%) [8]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating global and local sequence alignment methods for comparing patient medical records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

KELSA: A Knowledge-Enriched Local Sequence Alignment Algorithm for Comparing Patient Medical Records
Ming Huang ... Nilay D Shah
-
Ming Huang, et. al.Ming Huang ... Nilay D Shah
03 Nov 2020
03 Nov 2020

Analyzing the Interaction of RseA and RseB, the Two Negative Regulators of the σE Envelope Stress Response, Using a Combined Bioinformatic and Experimental Strategy
Nidhi Ahuja ... Carol A Gross
Journal of Biological Chemistry | VOL. 284
Nidhi Ahuja, et. al.Nidhi Ahuja ... Carol A Gross
01 Feb 2009
Journal of Biological Chemistry | VOL. 284

Programming Global and Local Sequence Alignment by Using R
Beatriz González-Pérez ... Juan Sampedro
-
Beatriz González-Pérez, et. al.Beatriz González-Pérez ... Juan Sampedro
24 Jul 2013
24 Jul 2013

Temporal sequence alignment in electronic health records for computable patient representation
Ming Huang ... Maryam Zolnoori
-
Ming Huang, et. al.Ming Huang ... Maryam Zolnoori
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating global and local sequence alignment methods for comparing patient medical records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making