Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model

Zhongyi Han,Shuo Li,Benzheng Wei,Yuanjie Zheng,Yilong Yin,Kejian Li

doi:10.1038/s41598-017-04075-z

Abstract

Automated breast cancer multi-classification from histopathological images plays a key role in computer-aided breast cancer diagnosis or prognosis. Breast cancer multi-classification is to identify subordinate classes of breast cancer (Ductal carcinoma, Fibroadenoma, Lobular carcinoma, etc.). However, breast cancer multi-classification from histopathological images faces two main challenges from: (1) the great difficulties in breast cancer multi-classification methods contrasting with the classification of binary classes (benign and malignant), and (2) the subtle differences in multiple classes due to the broad variability of high-resolution image appearances, high coherency of cancerous cells, and extensive inhomogeneity of color distribution. Therefore, automated breast cancer multi-classification from histopathological images is of great clinical significance yet has never been explored. Existing works in literature only focus on the binary classification but do not support further breast cancer quantitative assessment. In this study, we propose a breast cancer multi-classification method using a newly proposed deep learning model. The structured deep learning model has achieved remarkable performance (average 93.2% accuracy) on a large-scale dataset, which demonstrates the strength of our method in providing an efficient tool for breast cancer multi-classification in clinical settings.

Highlights

Automated breast cancer multi-classification from histopathological images is significant for clinical diagnosis and prognosis with the launch of the precision medicine initiative[1, 2]
General feature descriptors used for feature extraction have been invented, e.g., scale-invariant feature transform (SIFT)[9], gray-level co-occurrence matrix (GLCM)[10], histogram of oriented gradient (HOG)[11], etc
The source data comes from 82 anonymous patients of Pathological Anatomy and Cytopathology (P&D) Lab, Brazil

Summary

Results

The augmentation is done on the training set, validation and a testing phase are used for the real world data in patient-wise. Based on the standard method in machine learning domain[19], the augmentation method is only done on the training set, so the augmentation is only used for training, validation and a testing phase are used for the real world data in patient-wise. We first split the whole dataset based on patient-wise into training/validation/testing set, augmented the training examples based on the ratios of imbalanced classes. To make the results to be more reliable, we split the datasets based on patient-wise into three groups: training set, validation set, and testing set. The patients of the three-fold are non-overlapping and all experiment results are average accuracy from five cross

F Benign

Methods