Deep learning techniques are increasingly being used to classify medical imaging data with high accuracy. Despite this, due to often limited training data, these models can lack sufficient generalizability to predict unseen test data, produced in different domains, with comparable performance. This study focuses on thyroid histopathology image classification and investigates whether a Generative Adversarial Network [GAN], trained with just 156 patient samples, can produce high quality synthetic images to sufficiently augment training data and improve overall model generalizability. Utilizing a StyleGAN2 approach, the generative network produced images with an Fréchet Inception Distance (FID) score of 5.05, matching state-of-the-art GAN results in non-medical domains with comparable dataset sizes. Augmenting the training data with these GAN-generated images increased model generalizability when tested on external data sourced from three separate domains, improving overall precision and AUC by 7.45% and 7.20% respectively compared with a baseline model. Most importantly, this performance improvement was observed on minority class images, tumour subtypes which are known to suffer from high levels of inter-observer variability when classified by trained pathologists.
Read full abstract