Edit Distance Research Articles

BackgroundThe impression section integrates key findings of a radiology report but can be subjective and variable. We sought to fine-tune and evaluate an open-source Large Language Model (LLM) in automatically generating impressions from the remainder of a radiology report across different imaging modalities and hospitals.MethodsIn this institutional review board-approved retrospective study, we collated a dataset of CT, US, and MRI radiology reports from the University of California San Francisco Medical Center (UCSFMC) (n = 372,716) and the Zuckerberg San Francisco General (ZSFG) Hospital and Trauma Center (n = 60,049), both under a single institution. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, an automatic natural language evaluation metric that measures word overlap, was used for automatic natural language evaluation. A reader study with five cardiothoracic radiologists was performed to more strictly evaluate the model’s performance on a specific modality (CT chest exams) with a radiologist subspecialist baseline. We stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity.ResultsThe LLM achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on UCSFMC and upon external validation, ROUGE-L scores of 40.74, 37.89, and 24.61 on ZSFG across the CT, US, and MRI modalities respectively, implying a substantial degree of overlap between the model-generated impressions and impressions written by the subspecialist attending radiologists, but with a degree of degradation upon external validation. In our reader study, the model-generated impressions achieved overall mean scores of 3.56/4, 3.92/4, 3.37/4, 18.29 s,12.32 words, and 84 while the original impression written by a subspecialist radiologist achieved overall mean scores of 3.75/4, 3.87/4, 3.54/4, 12.2 s, 5.74 words, and 89 for clinical accuracy, grammatical accuracy, stylistic quality, edit time, edit distance, and ROUGE-L score respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings and on shorter impressions.ConclusionsAn open-source fine-tuned LLM can generate impressions to a satisfactory level of clinical accuracy, grammatical accuracy, and stylistic quality. Our reader performance study demonstrates the potential of large language models in drafting radiology report impressions that can aid in streamlining radiologists’ workflows.

Read full abstract

An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~(nmω-1)+O(N)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {\ ilde{O}}(nm^{\\omega -1})+\\mathcal {O}(N)$$\\end{document}-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\omega $$\\end{document} denotes the matrix multiplication exponent and the O~(·)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {\ ilde{O}}(\\cdot )$$\\end{document} notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+kN)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {O}(k^2mG+kN)$$\\end{document} time, under edit distance, or O(kmG+kN)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {O}(kmG+kN)$$\\end{document} time, under Hamming distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k=1\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$k=1$$\\end{document}, the existing algorithms run in Ω(mN)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\varOmega (mN)$$\\end{document} time in the worst case. In this paper we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+N)logm)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {O}((nm^2 + N)\\log m)$$\\end{document} or O(nm3+N)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {O}(nm^3 + N)$$\\end{document} time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {O}(nm^2\\sqrt{\\log m} + N\\log \\log m)$$\\end{document}-time algorithm. We also show that 1-EDSM can be solved in O(nm2+Nlogm)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\mathcal {O}(nm^2 + N\\log m)$$\\end{document} time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the k-errata trees for indexing with errors [Cole et al., STOC 2004]. This is an extended version of a paper presented at LATIN 2022.

Read full abstract

Edit Distance Research Articles

Related Topics

Articles published on Edit Distance

High throughput edit distance computation on FPGA-based accelerators using HLS

Pre-trained models for linking process in data washing machine

Train & Constrain: Phonologically Informed Tongue Twister Generation from Topics and Paraphrases

Evaluating Sequence Alignment Tools for Antimicrobial Resistance Gene Detection in Assembly Graphs

BWBEV: A Bitwise Query Processing Algorithm for Approximate Prefix Search

Entropy formulae on Feldman–Katok metric of random dynamical systems

Research on 3D geographic entity recognition method based on the double matching degree

Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes.

An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study

Elastic-Degenerate String Matching with 1 Error or Mismatch

Understanding the effectiveness of automated feedback: Using process data to uncover the role of behavioral engagement

Scene Chinese Recognition with Local and Global Attention

Video text tracking with transformer-based local search

On Computing the k -Shortcut Fréchet Distance

Attention-Based Deep Spiking Neural Networks for Temporal Credit Assignment Problems.

Stability from graph symmetrization arguments in generalized Turán problems

GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants.

Learning locality-sensitive bucketing functions.

Median and small parsimony problems on RNA trees.

Predicting redox potentials by graph-based machine learning methods.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Edit Distance Research Articles

Related Topics

Articles published on Edit Distance

High throughput edit distance computation on FPGA-based accelerators using HLS

Pre-trained models for linking process in data washing machine

Train &amp; Constrain: Phonologically Informed Tongue Twister Generation from Topics and Paraphrases

Evaluating Sequence Alignment Tools for Antimicrobial Resistance Gene Detection in Assembly Graphs

BWBEV: A Bitwise Query Processing Algorithm for Approximate Prefix Search

Entropy formulae on Feldman–Katok metric of random dynamical systems

Research on 3D geographic entity recognition method based on the double matching degree

Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes.

An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study

Elastic-Degenerate String Matching with 1 Error or Mismatch

Understanding the effectiveness of automated feedback: Using process data to uncover the role of behavioral engagement

Scene Chinese Recognition with Local and Global Attention

Video text tracking with transformer-based local search

On Computing the k -Shortcut Fréchet Distance

Attention-Based Deep Spiking Neural Networks for Temporal Credit Assignment Problems.

Stability from graph symmetrization arguments in generalized Turán problems

GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants.

Learning locality-sensitive bucketing functions.

Median and small parsimony problems on RNA trees.

Predicting redox potentials by graph-based machine learning methods.

Train & Constrain: Phonologically Informed Tongue Twister Generation from Topics and Paraphrases