Abstract

With the rapid growth of the network user base and the number of short videos, a large number of videos related to terrorism and violence have emerged in the Internet, which has brought great challenges to the governance of the network environment. At present, most short-video platforms still adopt manual-review and user-report mechanisms to filter videos related to terrorism and violence, which cannot adapt to the development trend of short-video business in terms of recognition accuracy and timeliness. In the single-mode recognition method of violent video, this paper mainly studies the scene recognition mode. Firstly, the U-Net network is improved with the SE-block module. After pretraining on the Cityscapes dataset, semantic segmentation of video frames is carried out. On this basis, semantic features of scenes are extracted using the VGG16 network loaded with ImageNet pretraining weights. SE-U-Net-VGG16 scene recognition model is constructed. The experimental results show that the prediction accuracy of SE-U-Net model is much higher than that of the FCN model and U-Net model. SE-U-Net model has significant advantages in the modal research of scene recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.