Human body pose estimation represented by joint rotations is essential for driving the virtual characters. The present paper developed a novel end-to-end point-to-pose mesh fitting network (P2P-MeshNet) to directly estimate the body joint rotations. P2P-MeshNet provided a strong collaboration between the deep learning network, an inverse kinematics network for body pose estimation (IKNet-body), and the self-correcting network, an iterative error feedback network (IEF). The introduced P2P-MeshNet was then applied to the free mocap (FreeMocap) dataset covering OpenPose 3D joint locations reconstructed from multi-view OpenPose 2D joint locations. The generated joint rotations were tested using the mean per joint position error (MPJPE), as well as the percentage of correct keypoints (PCK) along with the area under the PCK curve (AUC) with a threshold range of 0–60 mm after Procrustes aligned. Based on the compared metrics, P2P-MeshNet with 11.31 mm and 99.7% in estimate error and success rate as well as an AUC of 80.9 demonstrated a more consistent tool for future human body pose estimation. The runtime performance of 100 frames per second implied its potential application prospects.