Working condition recognition of a fused magnesium furnace (FMF) suffers from unbalanced under-burning condition samples, inconsistent quality of training samples and difficulty in characterizing dynamic production processes with images. This paper presents a novel approach to detect the under-burning working condition in FMF based on a 3D-cycle-generative adversarial network (3D-cycle-GAN) and Video Swin Transformer-Stochastic Configuration Networks (Video Swin-SCNs). Firstly, to resolve the temporal discontinuity defect caused by Cycle-GAN through the visual appearance of video composite frames, we construct a motion consistency-based 3D-Cycle-GAN model that considers the visual appearance and temporal continuity constraints of unpaired video transitions and is designed to generate video samples of under-burning working conditions. Secondly, a reinforcement learning approach is used to assess the value of the video quality and to filter out possible low-quality samples generated. Finally, the local attention is extended from the spatial domain to the spatial-temporal domain to solve the difficulty in characterizing the dynamic production process with the static images. The spatial-temporal features from the Video Swin Transformer are fed into SCNs to classify the working conditions. The experiment results indicate the effectiveness and feasibility of the proposed method.