Abstract

Motivation: The localization of objects in images is a longstanding objective within the field of image processing. Most current techniques are based on machine learning approaches, which typically require careful annotation of training samples in the form of expensive bounding box labels. The need for such large-scale annotation has only been exacerbated by the widespread adoption of deep learning techniques within the image processing community: deep learning is notoriously data-hungry. Method: In this work, we attack this problem directly by providing a new method for learning to localize objects with limited annotation: most training images can simply be annotated with their whole image labels (and no bounding box), with only a small fraction marked with bounding boxes. The training is driven by a novel loss function, which is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning. Care is taken to ensure that the loss is numerically well-posed. Additionally, we propose a neural network architecture which accounts for both patch dependence, through the use of Conditional Random Field layers, and shift-invariance, through the inclusion of anti-aliasing filters. Results: We demonstrate our method on the task of localizing thoracic diseases in chest X-ray images, achieving state-of-the-art performance on the ChestX-ray14 dataset. We further show that with a modicum of additional effort our technique can be extended from object localization to object detection, attaining high quality results on the Kaggle RSNA Pneumonia Detection Challenge. Conclusion: The technique presented in this paper has the potential to enable high accuracy localization in regimes in which annotated data is either scarce or expensive to acquire. Future work will focus on applying the ideas presented in this paper to the realm of semantic segmentation.

Highlights

  • Large-scale labelled datasets are one of the key ingredients in many recent algorithms in image processing and computer vision

  • In order to use deep learning to perform either of these tasks in standard fashion, one requires a fair amount of images with annotations that mirror the desired output: bounding boxes in the case of detection, and pixel-level masks in the case of segmentation

  • 2) We propose a new architecture for localization which accounts for both patch dependence and shiftinvariance, through the inclusion of Conditional Random Field (CRF) layers and anti-aliasing filters, respectively

Read more

Summary

INTRODUCTION

Large-scale labelled datasets are one of the key ingredients in many recent algorithms in image processing and computer vision. Systems based on computer-aided diagnosis (CAD) are desirable, as they can perform automatic detection of disease and pathologies in a consistent way, and perhaps with greater accuracy than human experts To this end, Rajpurkar et al proposed a deep-learning-based algorithm [4] which outperforms radiologists in disease classification on the ChestX-ray dataset [5]. The main contributions of this paper are as follows: 1) We propose a novel loss function for object localization with limited annotation. This loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning, and is numerically well-posed. We note that a preliminary version of this paper was presented at the Machine Learning for Health Workshop at NeurIPS 2019 [7]

RELATED WORK
THE LOSS FUNCTION
IMPLEMENTATION DETAILS
RESULTS Overall Results
ABLATION STUDY
Findings
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call