Abstract
The human parsing task is dedicated to allocating pixel-level fine-grained semantic labels. However, recent solutions are limited to fully utilize the prior information (poses and edges) and the potential information of the image data, remaining problems with ambiguous boundaries, incomplete human parts, and redundant labels. To solve the above problems, we propose a novel Multi-view Stack Network (MVSN), which is constructed by three stacks with multiple views of features included parts, edges, and pre-segmentation. Meanwhile, a channel correlator is developed to acquire the correlation between local and global information better. Comprehensive experiments and corresponding results on three public datasets show that the proposed MVSN performs favorably against the state-of-the-art methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have