A compressed large language model embedding dataset of ICD 10 CM descriptions

Michael J Kane,Casey King,Denise Esserman,Nancy K Latham,Erich J Greene,David A Ganz

doi:10.1186/s12859-023-05597-2

Michael J Kane, Casey King + Show 4 more

Open Access

https://doi.org/10.1186/s12859-023-05597-2

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper presents novel datasets providing numerical representations of ICD-10-CM codes by generating description embeddings using a large language model followed by a dimension reduction via autoencoder. The embeddings serve as informative input features for machine learning models by capturing relationships among categories and preserving inherent context information. The model generating the data was validated in two ways. First, the dimension reduction was validated using an autoencoder, and secondly, a supervised model was created to estimate the ICD-10-CM hierarchical categories. Results show that the dimension of the data can be reduced to as few as 10 dimensions while maintaining the ability to reproduce the original embeddings, with the fidelity decreasing as the reduced-dimension representation decreases. Multiple compression levels are provided, allowing users to choose as per their requirements, download and use without any other setup. The readily available datasets of ICD-10-CM codes are anticipated to be highly valuable for researchers in biomedical informatics, enabling more advanced analyses in the field. This approach has the potential to significantly improve the utility of ICD-10-CM codes in the biomedical domain.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 17, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

A compressed large language model embedding dataset of ICD 10 CM descriptions

Abstract

Published Version

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.
Satya S Sahoo ... Hongfang Liu
Journal of the American Medical Informatics Association : JAMIA | VOL. 31
Satya S Sahoo, et. al.Satya S Sahoo ... Hongfang Liu
24 Apr 2024
Journal of the American Medical Informatics Association : JAMIA | VOL. 31

Biomedical Informatics and the Convergence of Nano-Bio-Info-Cogno (NBIC) Technologies
V Maojo ... F Martin-Sanchez
Yearbook of Medical Informatics | VOL. 18
V Maojo, et. al.V Maojo ... F Martin-Sanchez
01 Aug 2009
Yearbook of Medical Informatics | VOL. 18

Promoting Ethical and Professional Responsibility in Biomedical Informatics Education
Bonnie Mname Kaplan ... Vignesh Mname Subbian
SSRN Electronic Journal | VOL. -
Bonnie Mname Kaplan, et. al.Bonnie Mname Kaplan ... Vignesh Mname Subbian
12 Jan 2018
SSRN Electronic Journal | VOL. -

Biomedical and Health Informatics Education and Research at the Information Technology Institute in Egypt
A Khalifa ... R Hussein
Yearbook of Medical Informatics | VOL. 20
A Khalifa, et. al.A Khalifa ... R Hussein
01 Aug 2011
Yearbook of Medical Informatics | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A compressed large language model embedding dataset of ICD 10 CM descriptions

Abstract

Published Version

Talk to us

Similar Papers

More From: BMC Bioinformatics