This paper introduces a novel approach to warp an RGB image to a new target pose, where it learns to inpaint the invisible parts of the image by sampling pixels from the visible parts. This technique is particularly relevant in applications such as novel pose image generation, where the quality of the generated samples heavily relies on the warped images. Conventional methods typically utilize an affine warping map estimator and learn to estimate the warping map through a downstream image generation task. However, this approach results in an affine function being returned regardless of the complexity of the deformation between the source and target poses. In contrast, our proposed method involves estimating the warping map using a convolutional function, which learns through alternating between two downstream tasks. Initially, the estimation treats the warping as part of an image generation task, similar to existing methods but without the constraint of an affine transformation. This encourages the estimation of complex transformations beyond affine deformations. Subsequently, disregarding the image generation task, we supervise the convolutional warping map to learn any potential affine transformation between the guiding poses of the sample. Our experimental results demonstrate the high effectiveness of our method in preserving image textures while transforming it into highly complex poses.