Using large language models for safety-related table summarization in clinical study reports.

Rogier Landman,Sean P Healey,Vittorio Loprinzo,Ulrike Kochendoerfer,Angela Russell Winnier,Peter V Henstock,Wenyi Lin,Aqiu Chen,Arthi Rajendran,Sushant Penshanwar,Sheraz Khan,Subha Madhavan

doi:10.1093/jamiaopen/ooae043

Abstract

The generation of structured documents for clinical trials is a promising application of large language models (LLMs). We share opportunities, insights, and challenges from a competitive challenge that used LLMs for automating clinical trial documentation. As part of a challenge initiated by Pfizer (organizer), several teams (participant) created a pilot for generating summaries of safety tables for clinical study reports (CSRs). Our evaluation framework used automated metrics and expert reviews to assess the quality of AI-generated documents. The comparative analysis revealed differences in performance across solutions, particularly in factual accuracy and lean writing. Most participants employed prompt engineering with generative pre-trained transformer (GPT) models. We discuss areas for improvement, including better ingestion of tables, addition of context and fine-tuning. The challenge results demonstrate the potential of LLMs in automating table summarization in CSRs while also revealing the importance of human involvement and continued research to optimize this technology.

Full Text