Abstract

In order to effectively combine RGB image features with depth image features for human detection, this paper proposes a two-stream RGB-D human detection algorithm based on RFB network. The proposed algorithm mainly contains three parts: RGB-stream, Depth-stream and Channel Weight Fusion (CWF) strategy. (1) The RGB-stream extracts RGB image features using RFB-Net as the backbone network. (2) By analyzing the results of depth features visualization, we build the Depth-stream, which can effectively extract the depth image features. (3) The improved CWF strategy can enhance the effectiveness of important channels in RGB-D fusion features and improve the capability of the network expression. The experimental results show that the proposed algorithm has a significant improvement compared with other algorithms on two common datasets.

Highlights

  • The fields of smart building and intelligent security are developing rapidly, and the human detection has become a hot research topic in these fields.In recent years, many researchers have conducted considerable work in using RGB images to detect human [2]–[8] and achieved good detection results

  • How to effectively extract depth image features and utilize RGB-D fusion features are the keys to human detection by combining RGB images with depth images

  • The results show that the proposed algorithm can extract effective depth image features and enhance the effectiveness of important channels in RGB-D fusion features

Read more

Summary

Introduction

The fields of smart building and intelligent security are developing rapidly, and the human detection has become a hot research topic in these fields. Many researchers have conducted considerable work in using RGB images to detect human [2]–[8] and achieved good detection results. RGB images are affected by factors such as human occlusion, human attitude changes, illumination changes and complex background. Compared with RGB images, depth images are not affected by illumination changes, and easier to obtain object contours with low-noise. Li et al [21] proposed an attention steered interweave fusion network (ASIF-Net) to detect salient objects. Han et al [23] proposed a multiview CNN fusion model through a combination layer connecting the representation layers of multiple views to detect salient objects

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.