With the rapid development of AR/VR technologies, achieving natural and seamless human-scene interactions has emerged as a critical challenge in computer vision. Existing methods suffer from low model placement accuracy and unnatural scene interactions. Therefore, we propose a framework called human-scene interaction with geometric and physical constraints (GP-HSI), which places a given pose of a 3D human model in an appropriate position within a 3D scene by establishing geometric and physical constraints, while ensuring interactive fidelity between the human and the scene. Specifically, first, we propose a pose-guided human contact semantic generation method, which generates human semantic labels by classifying the given human poses. Second, we propose a geometrically and semantically constrainted human model placement method, which determines the optimal position of the human model in the scene by constraining the geometric proximity and semantic consistency between models. Third, we propose an inverse kinematics based pose adjustment method, which finds the target human-scene interaction points by constructing a heterogeneous kinematic tree and solves the rotation matrix of human joints to obtain a physically plausible optimal human pose. At last, we develop an interactive system to visualize the generated human-scene interaction. The results of qualitative and quantitative experiments show that our approach is able to place human models at appropriate locations in the scene and generate plausible interactions.
Read full abstract