Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches.

Jose L Gómez,Antonio M López,Gabriel Villalonga

doi:10.3390/s21093185

Abstract

Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data-labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on par, at least when using an off-the-shelf depth estimation model not specifically trained on the translated images.

Highlights

IntroductionSupervised deep learning enables accurate computer vision models
We have addressed the curse of data labeling for onboard deep object detection
Following the supervised learning (SSL) paradigm, we have proposed multi-modal co-training for object detection

Summary

Introduction

Supervised deep learning enables accurate computer vision models Key for this success is the access to raw sensor data (i.e., images) with ground truth (GT) for the visual task at hand (e.g., image classification [1], object detection [2] and recognition [3], pixel-wise instance/semantic segmentation [4,5], monocular depth estimation [6], 3D reconstruction [7], etc.). In increasing order of labeling time, we see that image classification requires image-level tags, object detection requires object bounding boxes (BBs), instance/semantic segmentation requires pixel-level instance/class silhouettes, and depth GT cannot be manually provided. Manually collecting such GT is time-consuming and does not scale as we wish. This data labeling bottleneck may be intensified due to domain shifts among different image sensors, which could drive to per-sensor data labeling

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: May 4, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Transductive Semi-Supervised Deep Learning Using Min-Max Features
Weiwei Shi ... Nanning Zheng
-
Weiwei Shi, et. al.Weiwei Shi ... Nanning Zheng
01 Jan 2018
01 Jan 2018

Evaluation of structural integrity of railway bridge using acceleration data and semi-supervised learning approach
Jun S Lee ... Hyun Min Lee
Engineering Structures | VOL. 239
Jun S Lee, et. al.Jun S Lee ... Hyun Min Lee
17 Apr 2021
Engineering Structures | VOL. 239

Detecting Floating Macroplastic Litter with Semi-Supervised Deep Learning
Tianlong Jia ... Rinze De Vries
-
Tianlong Jia, et. al.Tianlong Jia ... Rinze De Vries
08 Mar 2024
08 Mar 2024

Co-Training for On-Board Deep Object Detection
Gabriel Villalonga ... Antonio M Lopez Pena
IEEE Access | VOL. 8
Gabriel Villalonga, et. al.Gabriel Villalonga ... Antonio M Lopez Pena
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors