Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback

Shenghuan Sun ,Ahmed M Alaa ,Gregory M Goldgof ,Atul J Butte

doi:10.48550/arxiv.2306.12438

Shenghuan Sun , Ahmed M Alaa + Show 2 more

PDF Available

https://doi.org/10.48550/arxiv.2306.12438

Copy DOI

Export

Save

Cite

Journal: arXiv (Cornell University)

Publication Date: Jun 16, 2023

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Generative models capable of capturing nuanced clinical features in medical images hold great promise for facilitating clinical data sharing, enhancing rare disease datasets, and efficiently synthesizing annotated medical images at scale. Despite their potential, assessing the quality of synthetic medical images remains a challenge. While modern generative models can synthesize visually-realistic medical images, the clinical validity of these images may be called into question. Domain-agnostic scores, such as FID score, precision, and recall, cannot incorporate clinical knowledge and are, therefore, not suitable for assessing clinical sensibility. Additionally, there are numerous unpredictable ways in which generative models may fail to synthesize clinically plausible images, making it challenging to anticipate potential failures and manually design scores for their detection. To address these challenges, this paper introduces a pathologist-in-the-loop framework for generating clinically-plausible synthetic medical images. Starting with a diffusion model pretrained using real images, our framework comprises three steps: (1) evaluating the generated images by expert pathologists to assess whether they satisfy clinical desiderata, (2) training a reward model that predicts the pathologist feedback on new samples, and (3) incorporating expert knowledge into the diffusion model by using the reward model to inform a finetuning objective. We show that human feedback significantly improves the quality of synthetic images in terms of fidelity, diversity, utility in downstream applications, and plausibility as evaluated by experts.

Full Text