Human pose transfer is to transfer a conditional person image to a new target pose. The difficulty lies in modeling the large-scale spatial deformation from the conditional pose to the target one. However, the commonly used 2D data representations and one-step flow prediction scheme lead to unreliable deformation prediction because of the lack of 3D information guidance and the great changes in the pose transfer. Therefore, to bring the original 3D motion information into human pose transfer, we propose to simulate the generation process of real person image. We drive the 3D human model reconstructed from the conditional person image with the target pose and project it to the 2D plane. The 2D projection thereby inherits the 3D information of the poses which can guide the flow prediction. Furthermore, we propose a progressive flow prediction network consisting of two streams. One stream is to predict the flow by decomposing the complex pose transformation into multiple sub-transformations. The other is to generate the features of the target image according to the predicted flow. Besides, to enhance the reliability of the generated invisible regions, we use the target pose information which contains structural information from the flow prediction stream as the supplementary information to the feature generation. The synthesized images with accurate depth information and sharp details demonstrate the effectiveness of the proposed method.
Read full abstract