Abstract With the advancement of computational capabilities, the realm of deep learning has expanded from 2D image tasks to encompass 3D video tasks. Human action recognition stands as a typical 3D video task, which emphasizes the accurately capture of local body structures. However, existing methods encounter limitations in precisely capturing these structures, especially in joints within the same body region that exhibits diverse static and dynamic features. To address this complexity, our study innovatively introduces a Structure-Enhanced Positional Embedding Module seamlessly integrated into the PoseFormer model, resulting in the innovative HStruPE-Former model. Experimental validation on the Human3.6M dataset demonstrates a significant performance improvement achieved by the HStruPE-Former model in the task of 3D human pose estimation. Particularly in scenarios involving intricate movements and postures, the module demonstrates improved stability and accuracy. Although the improvements are modest, this technology is expected to facilitate the exploration of untapped potential of structure-enhanced postion embedding modules in diverse neural network applications, thereby promoting further exploration in neural network research. This research outcome not only enhances the performance in current 3D video tasks but also paves the way for novel research directions in the field of deep learning.