An active learning based classification strategy for the minority class problem: application to histopathology annotation

Scott Doyle,Anant Madabhushi,Michael Feldman,James Monaco,John Tomaszewski

doi:10.1186/1471-2105-12-424

Scott Doyle, Anant Madabhushi + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-12-424

Copy DOI

Abstract

BackgroundSupervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL) with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL). Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost) required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem.ResultsUsing this class-balanced AL training strategy (CBAL), we build a classifier to distinguish cancer from non-cancer regions on digitized prostate histopathology. Our dataset consists of 12,000 image regions sampled from 100 biopsies (58 prostate cancer patients). We compare CBAL against: (1) unbalanced AL (UBAL), which uses AL but ignores class ratio; (2) class-balanced RL (CBRL), which uses RL with a specific class ratio; and (3) unbalanced RL (UBRL). The CBAL-trained classifier yields 2% greater accuracy and 3% higher area under the receiver operating characteristic curve (AUC) than alternatively-trained classifiers. Our cost model accurately predicts the number of annotations necessary to obtain balanced classes. The accuracy of our prediction is verified by empirically-observed costs. Finally, we find that over-sampling the minority class yields a marginal improvement in classifier accuracy but the improved performance comes at the expense of greater annotation cost.ConclusionsWe have combined AL with class balancing to yield a general training strategy applicable to most supervised classification problems where the dataset is expensive to obtain and which suffers from the minority class problem. An intelligent training strategy is a critical component of supervised classification, but the integration of AL and intelligent choice of class ratios, as well as the application of a general cost model, will help researchers to plan the training process more quickly and effectively.

Highlights

Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer
(2) Class ratios are addressed in this training strategy to prevent the training set from being biased toward the majority class
We applied these techniques to the task of quantitatively analyzing digital prostate tissue samples for presence of cancer, where the class-balanced AL training strategy (CBAL) training method yielded a classifier with accuracy and area under the curve (AUC) values similar to those obtained with the full training set using fewer samples than the unbalanced active learning (AL), class-balanced random learning, or unbalanced random learning methods

Summary

Introduction

Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. AL identifies unlabeled samples that are “informative” (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples This yields high accuracy with a smaller training set size compared with random learning (RL). In this case, the goal of the supervised classifier is to identify regions of carcinoma of the prostate (CaP, the target class). CaP often appears within and around non-CaP areas, and the boundary between these regions is not always clear (even to a trained expert) These factors increase the time, effort, and overall cost associated with training a supervised classifier in the context of digital pathology. A strategy known as active learning (AL) was developed to select only “informative” exemplars for annotation [9,10]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 28, 2011
Citations: 92	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

An active learning based classification strategy for the minority class problem: application to histopathology annotation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Consensus of Ambiguity: Theory and Application of Active Learning for Biomedical Image Analysis
Scott Doyle ... Anant Madabhushi
-
Scott Doyle, et. al.Scott Doyle ... Anant Madabhushi
01 Jan 2009
01 Jan 2009

Active deep learning: Improved training efficiency of convolutional neural networks for tissue classification in oral cavity cancer
Jonathan Folmsbee ... Margaret Brandwein-Weber
-
Jonathan Folmsbee, et. al.Jonathan Folmsbee ... Margaret Brandwein-Weber
01 Apr 2018
01 Apr 2018

DSAL: Deeply Supervised Active Learning From Strong and Weak Labelers for Biomedical Image Segmentation.
Ziyuan Zhao ... Cen Chen
IEEE Journal of Biomedical and Health Informatics | VOL. 25
Ziyuan Zhao, et. al.Ziyuan Zhao ... Cen Chen
18 Jan 2021
IEEE Journal of Biomedical and Health Informatics | VOL. 25

Histology segmentation using active learning on regions of interest in oral cavity squamous cell carcinoma
Jonathan Folmsbee ... Scott Doyle
Journal of pathology informatics | VOL. 13
Jonathan Folmsbee, et. al.Jonathan Folmsbee ... Scott Doyle
01 Jan 2021
Journal of pathology informatics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An active learning based classification strategy for the minority class problem: application to histopathology annotation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics