Abstract Introduction: Structured databases created from electronic health records (EHR) are crucial for cancer research. Manual data entry into databases is both labor-intensive and error-prone. We aimed to develop an artificial intelligence (AI)-driven approach for automatically inputting patient information from EHRs. Methods: A REDCap database for 53 patients with advanced lung cancer treated with first line immuno+/-chemotherapy at Gustave Roussy between 02/2021-06/2023 was manually populated by physicians with demographics, risk factors, cancer history, and treatment data, with 137 variables/patient. Given unstructured medical letters and a schematic description of each variable, generative AI was used to find, quote and process variables into a structured form. We directed large language model actions with prompt engineering and tailored few-shot examples. Mortality data were auto-extracted from the French public registry, INSEE. We assessed consistency between manual (MDE) and automated data entry (ADE), with a secondary manual review for mismatches performed by a senior physician. Results: In total, 7,261 data points were assessed. ADE averaged 10 minutes per patient inclusive of quality checks vs 1 h for MDA. The concordance rate between ADE and MDE was 90.2% (6,550/7,261) with a discordance of 6.8% (496/7,261). Data was missing in 0.8% (59/7,261) of cases for both methods, 1% (73/7,261) for ADE and 1,1% (83/7,261) for MDA. After checking discordances, ADE correctness was 95.3% (6,922/7,261) (Table), MDE was 93.8% (6,809/7,261). Errors in ADE (207/7,261) were due to algorithm refinement needs (70%) and missing EHR information (30%). Conclusion: Generative AI has a strong potential for identifying and structuring data from EHRs, yielding consistency superior to manual entry by physicians and 85% reduced amount of time. This may enhance the efficiency, accuracy, and scalability of EHR-to-database conversions. ADE for >1000 patients will be presented at the meeting. ADE correctness 100% 95-99% 90-94% 80-89% Sex; Age; Life status; Last follow-up; Metastatic from diagnosis Molecular alterations; Metastatic sites; Progression sites; Treatment type Tobacco & cannabis consumption; Histology; PDL1 score; Progression event Pack years; Date of diagnosis & first metastasis; Stage; Date of start treatment Citation Format: Mihaela Aldea, Pierre Rolland, Solenne Simon, Aliette Poplu, Muriel Wartelle, Benjamin Vignal, Jean-Charles Louis, Francois Lion, Arnaud Borie, David Planchard, Caroline Robert, Stefan Michiels, Fabrice Andre, Fabrice Barlesi, Franck Le Ouay, Benjamin Besse. Using AI to automatically process data from unstructured health records of patients with lung cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3569.
Read full abstract