Many recognition tasks including image/video classification, segmentation and object detection can be improved by the integration of global information. Although global information may be better represented in some recognition tasks than the others, it is worth exploring how global information from related tasks can be effectively used to improve the performance of a target task. The task of pose estimation predicts the locations of human joints, thus providing global information about the human body. In this paper, we propose a pose-aware global representation network model (PAGRnet) that exploits global information from pose estimation to enhance feature learning in human parsing. In our PAGRnet model, a novel learning module with three integrated parts is used to learn global information. The first part generates a global joint representation, while the second part learns the relationship between the pixels and joints. By integrating the global joint representation with the pixel-joint relationship, the resulting pose-aware global representation is augmented for the parsing task. Our experimental results show competitive performance of our method on the LIP, the Pascal-person-part and the ATR datasets, with reduced computation costs in comparison to other proposals of global information fusion. We also demonstrate the advantages of our feature fusion model over concatenation, pixel-wise and channel-wise relation models.
Read full abstract