Abstract

As a fine-grained semantic segmentation task, human parsing has attracted extensive attention in computer vision. However, without the assistance of heterogeneous information, it is difficult to obtain detailed human parsing directly. At present, although some studies have introduced heterogeneous data to guide human parsing task, such as pose estimation and edge prediction, the correlations between these heterogeneous data has not been effectively utilized. To avoid the distribution gap among heterogeneous data, we proposed a Heterogeneous Interactive Attention Network (HIANet), in which we exploit the attention between heterogeneous data to capture long-distance context dependence. And the supplementary cues with plentiful interaction can mutually guide multi-source features to correct their respective prediction errors, further refine the result of human parsing. Extensive experiments on three human body parsing datasets are conducted, especially on the LIP dataset, where the mean accuracy and mean Intersection-over-Union of the proposed HIANet are improved by 2.86% and 3.90% compared with PGECNet, respectively. Our code has been made available at https://github.com/wangwenjiawj/HIANet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call