Abstract

Around 1% of the population of the UK and North America have a diagnosis of coeliac disease (CD), due to a damaging immune response to the small intestine. Assessing whether a patient has CD relies primarily on the examination of a duodenal biopsy, an unavoidably subjective process with poor inter-observer concordance. Wei et al. [11] developed a neural network-based method for diagnosing CD using a dataset of duodenal biopsy whole slide images (WSIs). As all training and validation data came from one source, there was no guarantee that their results would generalize to WSIs obtained from different scanners and laboratories. In this study, the effects of applying stain normalization and jittering to the training data were compared. We trained a deep neural network on 331 WSIs obtained with a Ventana scanner (WSIs; CD: n=190; normal: n=141) to classify presence of CD. In order to test the effects of stain processing when validating on WSIs scanned on varying scanners and from varying laboratories, the neural network was validated on 4 datasets: WSIs of slides scanned on a Ventana scanner (WSIs; CD: n=48; normal: n=35), WSIs of the same slides rescanned on a Hamamatsu scanner (WSIs; CD: n=48; normal: n=35), WSIs of the same slides rescanned on an Aperio scanner (WSIs; CD: n=48; normal: n=35), and WSIs of different slides scanned on an Aperio scanner (WSIs; CD: n=38; normal: n=37).Without stain processing, the F1 scores of the neural network were 0.947, 0.619, 0.746, and 0.727 when validating on the Ventana validation WSIs, Hamamatsu and Aperio rescans of the Ventana validation WSIs, and Aperio WSIs from a different source respectively. With stain normalization, the performance of the neural network improved significantly with respective F1 scores 0.982, 0.943, 0.903, and 0.847. Stain jittering resulted in a better performance than stain normalization when validating on data from the same source F1 score 1.000, but resulted in poorer performance than stain normalization when validating on WSIs from different scanners (F1 scores 0.939, 0.814, and 0.747). This study shows the importance of stain processing, in particular stain normalization, when training machine learning models on duodenal biopsy WSIs to ensure generalizability between different scanners and laboratories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call