Abstract Introduction: The purpose of this study is to develop custom, open source, LLM enabled pipelines and agents for the extraction and standardization of data from free text medical records. Procedures: This project aims to create a versatile tool for healthcare data analysis, with initial development utilizing our extensive collection (over 10,000 documents) of renal cell carcinoma pathology reports. These reports are an ideal starting point as they cover over 20 years of care and consist of a racially and ethnically diverse cohort with over a quarter of patients being from traditionally underrepresented backgrounds. The jargon-abundant and highly technical nature of kidney cancer pathology represents a major challenge. We will accomplish this objective using an approach that balances the harnessing advanced AI capabilities with ensuring practicality and efficiency in real-world applications. First, we will leverage the capabilities of GPT-3.5/4 to generate a comprehensive labeled dataset (referred to as pseudo labels - machine-applied labels that will be used for training a newmodel) tailored for pathology, radiology, and clinician reports of kidney cancer patients. Second, we will harness the pseudo labeled datasets to train a smaller, open-source, yet highly efficient model for our specific needs (termed model distillation). LLAMA 2, known for its high performance in diverse tasks, is our model of choice. Its open-source nature and relatively modest computational requirements make it ideal for deployment within medical institutions. The ability to run locally ensures the protection of patient health information and reduces reliance on costlier, proprietary models. The rationale is threefold: first, to benefit from the superior performance and reasoning capabilities of GPT models; second, to create a streamlined model that is resource-efficient and tailored to our specific requirements; third, to enable the final streamlined model to be freely available by utilizing open source LLMs (e.g., Llama 2). To assess the feasibility of our methodology, we evaluated the ability of GPT-3.5 to label our data with high accuracy. Data Summary: Our pilot set of pathology reports consisted of 109 documents and 3 extraction fields: histology type, biopsy site, and biopsy procedure. Using GPT 3.5 we achieved an accuracy of over 95% in each extracted field type thereby demonstrating its ability to act as a proficient “teacher” model for our future open source “student” model. Conclusions: Our preliminary results suggest that current state of the art LLMs are highly proficient in extracting and standardizing information from clinical reports. Building out a full, open source, LLM enabled pipeline will increase the accuracy, flexibility, and efficiency of medical data extraction across institutions and lead to quicker and more informed decision-making in patient care and research. Citation Format: David Hein, Alana Christie, Hua Zhong, Ellen Araj, James Brugarolas, Lindsay Cowell, Payal Kapur, Andrew Jamieson. Learning Llama Agents for medical record analysis and standardization [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7390.
Read full abstract