Video class-incremental learning aims to recognize new actions while restricting the catastrophic forgetting of old ones, whose representative samples can only be saved in limited memory. Semantically variable subactions are susceptible to class confusion due to data imbalance. While existing methods address the problem by estimating and distilling the spatio-temporal knowledge, we further explores that the refinement of hierarchical correlations is crucial for the alignment of spatio-temporal features. To enhance the adaptability on evolved actions, we proposes a hierarchical aggregation strategy, in which hierarchical matching matrices are combined and jointly optimized to selectively store and retrieve relevant features from previous tasks. Meanwhile, a correlation refinement mechanism is presented to reinforce the bias on informative exemplars according to online hypercorrelation distribution. Experimental results demonstrate the effectiveness of the proposed method on three standard video class-incremental learning benchmarks, outperforming state-of-the-art methods. Code is available at: https://github.com/Lsen991031/HCE