Abstract

Parsing of human images is a fundamental task for determining semantic parts such as the face, arms, and legs, as well as a hat or a dress. Recent deep-learning-based methods have achieved significant improvements, but collecting training datasets with pixel-wise annotations is labor-intensive. In this paper, we propose two solutions to cope with limited datasets. Firstly, to handle various poses, we incorporate a pose estimation network into an end-to-end human-image parsing network, in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and can feed valuable features to the human-image parsing network. Secondly, to handle complicated backgrounds, we increase the variation in image backgrounds automatically by replacing the original backgrounds of human images with others obtained from large-scale scenery image datasets. Individually, each solution is versatile and beneficial to human-image parsing, while their combination yields further improvement. We demonstrate the effectiveness of our approach through comparisons and various applications such as garment recoloring, garment texture transfer, and visualization for fashion analysis.

Highlights

  • Human-image parsing is the image-processing task of assigning semantic labels to human body parts and clothing regions including the face, arms, and legs, or a hat, dress, etc. This task plays a crucial role in various applications in computer graphics and computer vision, e.g., virtual fitting systems [1], clothing retrieval [2], and recommendation [3, 4]

  • Contextualized convolutional neural networks (CNNs) (Co-CNN) [6] is a neural network devised to improve the performance of human-image parsing

  • Recall that our purpose is to improve the performance of human-image parsing when limited training data are available, and our Method Co-CNN data augmentation method (DA) pose estimation information (PE) DA+PE Co-CNN DA PE DA+PE

Read more

Summary

Introduction

Human-image parsing is the image-processing task of assigning semantic labels to human body parts and clothing regions including the face, arms, and legs, or a hat, dress, etc. Recent human-image parsing methods using deep learning have exhibited significant improvements Such methods require a sufficiently large training dataset in order to cope with various human poses and complicated background images. Training data are usually produced by manually annotating images with pixel-wise labels, which is quite tedious and costly even if we use crowd sourcing. This leads to the following research question: “Can we improve human-image parsing using a limited training dataset?”. We show several applications such as garment recoloring, garment texture transfer, and visualization for fashion analysis using our humanimage parsing results

Related work
Convolutional pose machines
Contextualized CNN
Transferring information pose estimation
Learning
Augmenting background variations
Settings
Evaluation methods
Results
Applications
Garment texture transfer
Visualization for fashion analysis
Conclusions and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call