Abstract

How an image is represented as the input of a convolutional neural network (CNN) is important because this input directly influences the performance of the CNN. In this paper, we investigate the representation of spherical images by focusing on the inclination estimation of a spherical camera. Unlike other approaches to CNN-based inclination estimation, a spherical image is represented as a geodesic-division-based discrete spherical image (DSI) that is obtained by sampling a sphere as uniformly as possible. The input of the CNN is a single image that consists of five parallelograms flattened from a regular icosahedron. To demonstrate the advantage of the proposed method, comparative experiments are conducted with two other spherical image representations, namely, equirectangular projection (ERP) and cubemap projection (CMP). The experimental results show that the proposed method using a geodesic-division-based discrete spherical image as the CNN input obtains the best performance-better than that of the cubemap and far superior to that of the equirectangular image. The effect of the image representations used becomes more significant as the relative inclination decreases. Moreover, comparative experiments are conducted using the state-of-the-art methods for spherical camera inclination compensation to further illustrate the superiority of the DSI representation. Consequently, the proposed method provides an important reference for the development of CNNs intended for spherical images.

Highlights

  • Since the precisions of both discrete spherical image (DSI) and cubemap projection (CMP) were above 95%, the convolutional neural network (CNN)-based spherical image inclination estimation task was feasible

  • The results show that the strategy of training from scratch achieved the worst performance, while the fine-tuned networks based on the pretrained models improved the classification accuracy of the current classification tasks

  • The results show that the DSI performs the best among the three image representations, followed by the CMP, while the equirectangular projection (ERP) performs the worst

Read more

Summary

Introduction

A. BACKGROUND 1) PIN-HOLE CAMERA MODEL VS. SPHERICAL CAMERA MODEL A spherical camera is a camera having the entire field-ofview (FOV). While a conventional camera that captures perspective images originates from the pin-hole camera model, a spherical camera that captures spherical images is represented by the spherical camera model. Spherical images are widely used and have been studied in the fields of medical science, such as representation of the retinal images of the eyes of humans [22], [49], geography, such as representation of the earth [43], meteorology, such as computation of atmospheric motion [42], and computer vision, such as immersive virtual reality [20], [26], [27], visual surveillance [29], augmented reality [25] and robotics [23], [24], [28].

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.