Skeleton-based human action recognition and analysis have become increasingly attainable in many areas, such as security surveillance and anomaly detection. Given the prevalence of skeleton-based applications, tampering attacks on human skeletal features have emerged very recently. In particular, checking the temporal inconsistency and/or incoherence (TII) in the skeletal sequence of human action is a principle of forgery detection. To this end, we propose an approach to self-supervised learning of the temporal causality behind human action, which can effectively check TII in skeletal sequences. Especially, we design a multilevel skeleton-based forgery detection framework to recognize the forgery on frame level, clip level, and action level in terms of learning the corresponding temporal-causal skeleton representations for each level. Specifically, a hierarchical graph convolution network architecture is designed to learn low-level skeleton representations based on physical skeleton connections and high-level action representations based on temporal-causal dependencies for specific actions. Extensive experiments consistently show state-of-the-art results on multilevel forgery detection tasks and superior performance of our framework compared to current competing methods.