The task of fashion parsing aims to assign pixel-level labels to clothing targets; thereby, parsing models are required to have good contextual recognition ability. However, the shapes of clothing components are complex, and the types are difficult to distinguish. Recent solutions focus on improving datasets and supplying abundant priori information, but the utilization of features by more efficient methods is rarely explored. In this paper, we propose a multi-scale fashion parsing model called the Priori Dictionary Network (PDN), which includes a priori attention module and a multi-scale backbone. The priori attention module extracts high dimensional features from our designed clothing average template as a priori information dictionary (priori dictionary, PD), and the PD is utilized to activate the feature maps of a CNN from a multi-scale attention mechanism. The backbone is derived from classical models, and five side paths are designed to leverage the richer features of local and global contextual representations. To measure the performance of our method, we evaluated the model on four public datasets, the CFPD, UTFR-SBD3, ModaNet and LIP, and the experimental results show that our model stands out from other State of the Art in all four datasets. This method can assist with the labeling problem of clothing datasets.
Read full abstract