Abstract

Currently, video scene segmentation is an important part of realizing content-based video retrieval (CBVR). Aiming at the problem that low efficiency of video scene segmentation in CBVR, this paper proposed a multi-modal video scene segmentation optimization algorithm based on feature extraction of convolutional neural network (CNN). According to the large amount of information contained in the multi-modal data of video, the VGG19 network has been improved in a targeted manner and the underlying features and semantic features of various modes are extracted from each video shots. By forming these features into vectors and using the method such as triplet loss learning and shot similarity calculation, scene segmentation task is converted to a binary classification problem for shot boundary. Then the scoring mechanism is established to optimize the results, finally the scene segmentation task is completed. Experimental results show that the algorithm can be effective in video scene segmentation, and the overall recall and precision can reach 85.77% and 87.01%, respectively. Compared with the shot similarity graph method, two indicators have increased by 10% and 9% respectively. Compared with the DeepSSS method that also uses the deep learning network model, the comprehensive metric F-messure has increased by 8%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call