261 Background: Clinicians use next-generation sequencing (NGS) to tailor targeted treatments for PCa patients. However, the results are often available in unstructured formats and are saturated with technical jargon, making them inaccessible for patients. Therefore, we aimed to assess the efficacy of LLMs to accurately extract genomic characteristics from unstructured reports and generate patient-friendly summaries for ease of variant interpretation in PCa patients. Methods: This retrospective study included patients with PCa who underwent NGS from 2023-2024. The GPT4 model was utilized using structured zero-shot prompts to extract genomic characteristics from unstructured reports in a non-decomposed manner. Prompt development was conducted iteratively using a 2% random sample of the dataset. Genomic characteristics ascertained by LLM were assessed against held out vendor-curated test dataset for evaluation. Performance was assessed using weighted evaluation metrics (precision, recall, and F1-score). Additionally, the extracted variables were organized using rule-based criterion and presented as a separate prompt to GPT4 to generate summaries for interpretation at the level of each biologically relevant variant. Mean Flesch-Kincaid scores and Vocd-D statistic with standard deviations were computed to assess readability and lexical diversity of LLM-generated variant interpretation summaries. Results: A total of 331 patients (370 NGS reports) were included in the study. Evaluation on the held-out test dataset (Table) showed that the weighted average precision ranged from 80% for extracting tumor mutational burden to 100% for extracting the CtDNA fraction from unstructured reports. The weighted average recall ranged from 85% for extracting CtDNA tumor fraction to 99% for ascertaining variant/alteration. The weighted average F1-scores varied from 0.83 for extracting tumor mutational burden to 1.0 for extracting specimen site from NGS reports. The mean Flesch-Kincaid score was 8.29 ± 1 indicating average/standard readability ease. The mean Vocd-D statistic was 7.55 ± 1 indicating focused summaries, however, with limited lexical diversity. Conclusions: This study suggests that large language models can effectively extract genomic characteristics from unstructured reports and can potentially improve genomic literacy among prostate cancer patients by providing easy-to-interpret variant summaries. However, targeted prompting may be required to increase lexical diversity for improved engagement. Performance across different genomic characteristics. Category Precision Recall F1-Score Altered gene 0.86 0.97 0.89 Variant/alteration type 0.98 0.99 0.98 Variant allele fraction 0.96 0.96 0.96 Test name/type 1 0.99 1 Specimen site 1 0.99 1 Tumor mutational burden 0.8 0.93 0.83 CtDNA tumor fraction 1 0.85 0.92 MSI status 0.99 0.97 0.98
Read full abstract