The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.