Abstract
The advent of general-purpose large language models (LLMs) like ChatGPT (OpenAI, San Francisco, CA) has revolutionized natural language processing, but their applicability in specialized medical fields like plastic surgery remains limited due to a lack of domain-specific knowledge. This study aims to develop and evaluate PlasticSurgeryGPT, a dedicated LLM fine-tuned on plastic surgery literature, to enhance performance in clinical decision support, surgical education, and research within the field. A comprehensive dataset of 25,389 plastic surgery research abstracts published between January 1, 2010, and January 1, 2024, was retrieved from PubMed. The abstracts underwent rigorous preprocessing, including text cleaning and tokenization. We fine-tuned the pre-trained GPT-2 model on this dataset using the PyTorch and HuggingFace frameworks. The performance of PlasticSurgeryGPT was evaluated against the default GPT-2 model using BLEU, METEOR, and ROUGE-1 metrics. The fine-tuned model, named PlasticSurgeryGPT, demonstrated substantial improvements over the generic GPT-2 model in capturing the semantic nuances of plastic surgery text. PlasticSurgeryGPT outperformed GPT-2 across BLEU, METEOR, and ROUGE-1 metrics, with scores of 0.135519, 0.583554, and 0.216813, respectively, compared to GPT-2's scores of 0.130179, 0.550498, and 0.215494. PlasticSurgeryGPT represents the first plastic surgery-specific LLM, demonstrating enhanced performance in generating relevant and accurate content compared to a general-purpose model. This work underscores the potential of domain-specific LLMs in improving clinical practice, surgical education, and research in plastic surgery. Future studies should focus on incorporating full-text articles, multimodal data, and larger models to further enhance performance and applicability.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have