Vision-based hand reconstructions have become noteworthy tools in enhancing interactive experiences in various applications such as virtual reality, augmented reality, and autonomous driving, which enable sophisticated interactions by reconstructing complex motions of human hands. Despite significant progress driven by deep-learning methodologies, the quest for high-fidelity interacting hands reconstruction faces challenges such as limited dataset diversity, lack of detailed hand representation, occlusions, and differentiation between similar hand structures. This survey thoroughly reviews deep learning-based methods, diverse datasets, loss functions, and evaluation metrics addressing the complexities of interacting hands reconstruction. Mainstream algorithms of the past five years are systematically classified into two main categories: algorithms that employ explicit representations, such as parametric meshes and 3D Gaussian splatting, and those that utilize implicit representations, including signed distance fields and neural radiance fields. Novel deep-learning models like graph convolutional networks and transformers are applied to solve the aforementioned challenges in hand reconstruction effectively. Beyond summarizing these interaction-aware algorithms, this survey also briefly discusses hand tracking in virtual reality and augmented reality. To the best of our knowledge, this is the first survey specifically focusing on the reconstruction of both hands and their interactions with objects. The survey contains the various facets of hand modeling, deep learning approaches, and datasets, broadening the horizon of hand reconstruction research and future innovation in natural user interactions.
Read full abstract