Abstract

Human parsing, which aims at resolving human body and clothes into semantic part regions from an human image, is a fundamental task in human-centric analysis. Recently, the approaches for human parsing based on deep convolutional neural networks (DCNNs) have made significant progress. However, hierarchically exploiting multiscale and spatial contexts as convolutional features is still a hurdle to overcome. In order to boost the scale and spatial awareness of a DCNN, we propose two effective structures, named “Attention SPP and Attention RefineNet,” to form a Mutual Attention operation, to exploit multiscale and spatial semantics different from the existing approaches. Moreover, we propose a novel Attention Guidance Network (AG-Net), a simple yet effective architecture without using bells and whistles (such as human pose and edge information), to address human parsing tasks. Comprehensive evaluations on two public datasets well demonstrate that the AG-Net outperforms the state-of-the-art networks.

Highlights

  • Human parsing, which segments a human image into the regions of semantic parts, has recently received considerable interest in computer vision areas

  • To exploit multiscale and spatial awareness with the attention-oriented philosophy, we propose an efficient Attention Guidance Network (AG-Net) for human parsing, which is shown as Figure 2

  • Guided by Mutual Attention, the Spatial Pyramid Pooling (SPP) and RefineNet have further powerful capacity to exploit multiscale and spatial semantics. erefore, the whole model is designed with the attention-guided philosophy, which aims at selectively emphasising informative features and restraining less useful ones, and the network has much powerful awareness to handle the complicated multiscale- and spatialoriented features in human parsing task

Read more

Summary

Introduction

Human parsing, which segments a human image into the regions of semantic parts, has recently received considerable interest in computer vision areas. E Spatial Pyramid Pooling (SPP) [10,11,12] and the RefineNet [13] approaches, where parallel convolution layers with different receptive fields are used to capture multiscale information, are two prevalent strategies to get over this hurdle These multibranch methods employ only a concatenation or an additional operation to achieve a feature fusion, producing feature redundancies and suppressing the representation capacity of the whole network. (i) To hurdle the issues of feature redundancies and spatial semantic limitations in SPP and RefineNet, we propose Attention SPP and Attention RefineNet and form Mutual Attention to recalibrate models (ii) A portable and powerful architecture, named Attention Guidance Network (AG-Net), is designed to boost the multiscale and spatial semantic presentation ability in a deep learning model and obtain a brilliant human parsing performance e remainder of this paper is organized as follows. We describe each part of the proposed network in detail in Section 3. e experiments and conclusions are provided in Sections 4 and 5, respectively

Related Works
Experiment Analysis
Datasets
Method
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call