Abstract

In human parsing, graph convolutional networks (GCNs), which naturally model the skeleton of the human body as a fixed graph, have been witnessed to obtain remarkable performance. However, the existing methods perform the fixed graph modeling over all the training samples. This may not be an optimal graph for the diversity of the samples that contain various shapes of human parts, complex body postures, severe occlusions and dense crowd, etc. Focusing on this, we propose a new Multilabel Learning based Adaptive Graph Convolutional Network (ML-AGCN) for human parsing. The ML-AGCN includes three modules: adaptive graph generation module, semantic parts based attention module and label consistency loss. Concretely, to effectively deal with the different sizes and connectivities of the optimal graph for different samples, we first propose an adaptive graph generation module based on multilabel learning that contains graph node adaptation (GNA) and graph connection adaptation (GCA). Then, for a more comprehensive node embedding, we design a semantic parts based attention module to optimally fuse fixed graph embeddings and adaptive graph embeddings. Besides, to further explicitly constraint the consistency between the predicted multilabel and the predicted human parsing results, we propose a label consistency loss that can simultaneously refine the human parsing results and optimize the accuracy of the adaptive graph. Extensive experiments on four challenging datasets, including PASCAL-Person-Part, ATR, LIP and CIHP, well demonstrate the effectiveness of our model, and it outperforms other state-of-the-art methods in human parsing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call