High-throughput adaptive sampling for whole-slide histopathology image analysis (HASHI) via convolutional neural networks: Application to invasive breast cancer detection.

Natalie Shih,Shridar Ganesan,Fabio González,John Tomaszewski,Michael Feldman,Anant Madabhushi,Angel Cruz-Roa,Ajay Basavanhally,Yuanquan Wang,Hannah Gilmore

doi:10.1371/journal.pone.0196828

Abstract

Precise detection of invasive cancer on whole-slide images (WSI) is a critical first step in digital pathology tasks of diagnosis and grading. Convolutional neural network (CNN) is the most popular representation learning method for computer vision tasks, which have been successfully applied in digital pathology, including tumor and mitosis detection. However, CNNs are typically only tenable with relatively small image sizes (200 × 200 pixels). Only recently, Fully convolutional networks (FCN) are able to deal with larger image sizes (500 × 500 pixels) for semantic segmentation. Hence, the direct application of CNNs to WSI is not computationally feasible because for a WSI, a CNN would require billions or trillions of parameters. To alleviate this issue, this paper presents a novel method, High-throughput Adaptive Sampling for whole-slide Histopathology Image analysis (HASHI), which involves: i) a new efficient adaptive sampling method based on probability gradient and quasi-Monte Carlo sampling, and, ii) a powerful representation learning classifier based on CNNs. We applied HASHI to automated detection of invasive breast cancer on WSI. HASHI was trained and validated using three different data cohorts involving near 500 cases and then independently tested on 195 studies from The Cancer Genome Atlas. The results show that (1) the adaptive sampling method is an effective strategy to deal with WSI without compromising prediction accuracy by obtaining comparative results of a dense sampling (∼6 million of samples in 24 hours) with far fewer samples (∼2,000 samples in 1 minute), and (2) on an independent test dataset, HASHI is effective and robust to data from multiple sites, scanners, and platforms, achieving an average Dice coefficient of 76%.

Highlights

The advent of whole-slide digital scanners has allowed for rapid digitization of histopathology slides, making these digitized slides images easy to store, visualize, share and analyze using computational tools
This paper presents a High-throughput Adaptive Sampling for whole-slide Histopathology Image analysis (HASHI), a novel, accurate and high-throughput framework that combines the powerful capabilities of convolutional neural networks (CNN) models for image recognition and an adaptive sampling method for rapid detection of precise extent of invasive Breast cancer (BCa) on Whole-slide images (WSI)
High-throughput adaptive sampling for WSI via CNN: Application to invasive BCa detection kernel functions were evaluated: linear, radial basis function (RBF), intersection, Chi-square (χ2), and Jenson-Shannon’s

Summary

Introduction

The advent of whole-slide digital scanners has allowed for rapid digitization of histopathology slides, making these digitized slides images easy to store, visualize, share and analyze using computational tools This rapidly growing field of Digital Pathology [1,2,3] is resulting in one of the newest forms of “big data”. The TCGA currently hosts 11,079 cancer studies involving 34 different types of cancer and hosting over 1,095 Terabytes (*1 Petabyte) of data [4]. This high volume of data requires the development and application of high throughput computational image analysis approaches for mining the digital image data. CNNs are multilayer neural networks, combining different types of layers (convolutional, pooling, classification) which need to be trained in a supervised manner [5] for image analysis and classification tasks, which have focused on very small images [7,8,9]

Methods

Results

Conclusion