AbstractParametric human modeling are limited to either single‐view frameworks or simple multi‐view frameworks, failing to fully leverage the advantages of easily trainable single‐view networks and the occlusion‐resistant capabilities of multi‐view images. The prevalent presence of object occlusion and self‐occlusion in real‐world scenarios leads to issues of robustness and accuracy in predicting human body parameters. Additionally, many methods overlook the spatial connectivity of human joints in the global estimation of model pose parameters, resulting in cumulative errors in continuous joint parameters.To address these challenges, we propose a flexible and efficient iterative decoding strategy. By extending from single‐view images to multi‐view video inputs, we achieve local‐to‐global optimization. We utilize attention mechanisms to capture the rotational dependencies between any node in the human body and all its ancestor nodes, thereby enhancing pose decoding capability. We employ a parameter‐level iterative fusion of multi‐view image data to achieve flexible integration of global pose information, rapidly obtaining appropriate projection features from different viewpoints, ultimately resulting in precise parameter estimation. Through experiments, we validate the effectiveness of the HIDE method on the Human3.6M and 3DPW datasets, demonstrating significantly improved visualization results compared to previous methods.
Read full abstract