Abstract

Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture. However, low-resolution human objects are ubiquitous due to trade-off between the field of view and target distance given a limited camera resolution. In this paper, we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image (MILI). To perceive more information from a low-resolution image, we use pair-wise images at high resolution and low resolution for training, and design a restoration network with a simple loss for better feature extraction from the low-resolution image. To address the occlusion problem in multi-person scenes, we propose an occlusion-aware mask prediction network to estimate the mask of each person during 3D mesh regression. Experimental results on both small-scale scenes and large-scale scenes demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively. The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call