In this paper, we propose a novel framework for multi-person pose estimation and tracking on challenging scenarios. In view of occlusions and motion blurs which hinder the performance of pose tracking, we proposed to model humans as graphs and perform pose estimation and tracking by concentrating on the visible parts of human bodies which are informative about complete skeletons under incomplete observations. Specifically, the proposed framework involves three parts: (i) A Sparse Key-point Flow Estimating Module (SKFEM) and a Hierarchical Graph Distance Minimizing Module (HGMM) for estimating pixel-level and human-level motion, respectively; (ii) Pixel-level appearance consistency and human-level structural consistency are combined in measuring the visibility scores of body joints. The scores guide the pose estimator to predict complete skeletons by observing high-visibility parts, under the assumption that visible and invisible parts are inherently correlated in human part graphs. The pose estimator is iteratively fine-tuned to achieve this capability; (iii) Multiple historical frames are combined to benefit tracking which is implemented using HGMM. The proposed approach not only achieves state-of-the-art performance on PoseTrack datasets but also contributes to significant improvements in other tasks such as human-related anomaly detection.