Abstract This work addresses the challenges of scene segmentation and low segmentation accuracy in short videos by employing virtual reality (VR) technology alongside a 3D DenseNet model for real-time segmentation in dynamic scenes. First, this work extracted short videos by frame and removed redundant background information. Then, the volume rendering algorithm in VR technology was used to reconstruct short videos in dynamic scenes in 3D. It enriched the detailed information of short videos, and finally used the 3D DenseNet model for real-time segmentation of short videos in dynamic scenes, improving the accuracy of segmentation. The experiment compared the performance of High resolution network, Mask region based convolutional neural network, 3D U-Net, Efficient neural network models on the Densely annotation video segmentation dataset. The experimental results showed that the segmentation accuracy of the 3D DenseNet model has reached 99.03%, which was 15.11% higher than that of the ENet model. The precision rate reached 98.33%, and the average segmentation time reached 0.64 s, improving the segmentation accuracy and precision rate. It can adapt to various scene situations and has strong robustness. The significance of this research lies in its innovative approach in tackling these issues. By integrating VR technology with advanced deep learning models, we can achieve more precise segmentation of dynamic scenes in short videos, enabling real-time processing. This has significant practical implications for fields such as video editing, VR applications, and intelligent surveillance. Furthermore, the outcomes of this research contribute to advancing computer vision in video processing, providing valuable insights for the development of future intelligent video processing systems.