Abstract
In order to effectively combine RGB image features with depth image features for human detection, this paper proposes a two-stream RGB-D human detection algorithm based on RFB network. The proposed algorithm mainly contains three parts: RGB-stream, Depth-stream and Channel Weight Fusion (CWF) strategy. (1) The RGB-stream extracts RGB image features using RFB-Net as the backbone network. (2) By analyzing the results of depth features visualization, we build the Depth-stream, which can effectively extract the depth image features. (3) The improved CWF strategy can enhance the effectiveness of important channels in RGB-D fusion features and improve the capability of the network expression. The experimental results show that the proposed algorithm has a significant improvement compared with other algorithms on two common datasets.
Highlights
The fields of smart building and intelligent security are developing rapidly, and the human detection has become a hot research topic in these fields.In recent years, many researchers have conducted considerable work in using RGB images to detect human [2]–[8] and achieved good detection results
How to effectively extract depth image features and utilize RGB-D fusion features are the keys to human detection by combining RGB images with depth images
The results show that the proposed algorithm can extract effective depth image features and enhance the effectiveness of important channels in RGB-D fusion features
Summary
The fields of smart building and intelligent security are developing rapidly, and the human detection has become a hot research topic in these fields. Many researchers have conducted considerable work in using RGB images to detect human [2]–[8] and achieved good detection results. RGB images are affected by factors such as human occlusion, human attitude changes, illumination changes and complex background. Compared with RGB images, depth images are not affected by illumination changes, and easier to obtain object contours with low-noise. Li et al [21] proposed an attention steered interweave fusion network (ASIF-Net) to detect salient objects. Han et al [23] proposed a multiview CNN fusion model through a combination layer connecting the representation layers of multiple views to detect salient objects
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.