Abstract

Recently proposed spherical convolutional neural networks (SCNNs) have shown advantages over conventional planar CNNs on classifying spherical images. However, two factors hamper their application in an objection detection task. First, a convolution in S2 (a two-dimensional sphere in three-dimensional space) or SO(3) (three-dimensional special orthogonal group) space results in the loss of an object’s location. Second, overlarge bandwidth is required to preserve a small object’s information on a sphere because the S2/SO(3) convolution must be performed on the whole sphere, instead of a local image patch. In this study, we propose a novel grid-based spherical CNN (G-SCNN) for detecting objects from spherical images. According to input bandwidth, a sphere image is transformed to a conformal grid map to be the input of the S2/SO3 convolution, and an object’s bounding box is scaled to cover an adequate area of the grid map. This solves the second problem. For the first problem, we utilize a planar region proposal network (RPN) with a data augmentation strategy that increases rotation invariance. We have also created a dataset including 600 street view panoramic images captured from a vehicle-borne panoramic camera. The dataset contains 5636 objects of interest annotated with class and bounding box and is named as WHU (Wuhan University) panoramic dataset. Results on the dataset proved our grid-based method is extremely better than the original SCNN in detecting objects from spherical images, and it outperformed several mainstream object detection networks, such as Faster R-CNN and SSD.

Highlights

  • A vision-based object detection task is to recognize and locate objects of interest in a given image efficiently and accurately

  • Our method extends the applications of spherical CNN (SCNN) to object detection for the first time

  • This study proposed a novel and effective grid based spherical Convolutional neural networks (CNNs) (G-SCNN) that extends the capacity of a spherical CNN to object detection for the first time

Read more

Summary

Introduction

A vision-based object detection task is to recognize and locate objects of interest in a given image efficiently and accurately. Convolutional neural networks (CNNs) have shown outstanding performances in object detection [1,2,3], as well as in other vision tasks such as image classification [4,5,6]. As omnidirectional or panoramic camera has shown a wide range of applications in virtual reality [12], driverless cars [13], monitoring systems [14] and SLAM [15,16], how to detect objects from a spherical image becomes more significant. Different from planar images, position-related distortions are unavoidable when projecting a spherical signal to its planar representation. This type of distortion was deeply investigated in.

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.