Human motion capture (mocap) data is of crucial importance to the realistic character animation, and the missing optical marker problem caused by marker falling off or occlusions often limit its performance in real-world applications. Although great progress has been made in mocap data recovery, it is still a challenging task primarily due to the articulated complexity and long-term dependencies in movements. To tackle these concerns, this paper proposes an efficient mocap data recovery approach by using Relationship-aggregated Graph Network and Temporal Pattern Reasoning (RGN-TPR). The RGN is comprised of two tailored graph encoders, local graph encoder (LGE) and global graph encoder (GGE). By dividing the human skeletal structure into several parts, LGE encodes the high-level semantic node features and their semantic relationships in each local part, while the GGE aggregates the structural relationships between different parts for whole skeletal data representation. Further, TPR utilizes self-attention mechanism to exploit the intra-frame interactions, and employs temporal transformer to capture long-term dependencies, whereby the discriminative spatio-temporal features can be reasonably obtained for efficient motion recovery. Extensive experiments tested on public datasets qualitatively and quantitatively verify the superiorities of the proposed learning framework for mocap data recovery, and show its improved performance with the state-of-the-arts.