A Deep Learning-Based Approach for Inappropriate Content Detection and Classification of YouTube Videos

Kanwal Yousaf,Tabassam Nawaz

doi:10.1109/access.2022.3147519

Kanwal Yousaf, Tabassam Nawaz

Open Access

https://doi.org/10.1109/access.2022.3147519

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 37	License type: CC BY 4.0

Affiliation: University of Engineering and Technology Taxila

Abstract

The exponential growth of videos on YouTube has attracted billions of viewers among which the majority belongs to a young demographic. Malicious uploaders also find this platform as an opportunity to spread upsetting visual content, such as using animated cartoon videos to share inappropriate content with children. Therefore, an automatic real-time video content filtering mechanism is highly suggested to be integrated into social media platforms. In this study, a novel deep learning-based architecture is proposed for the detection and classification of inappropriate content in videos. For this, the proposed framework employs an ImageNet pre-trained convolutional neural network (CNN) model known as EfficientNet-B7 to extract video descriptors, which are then fed to bidirectional long short-term memory (BiLSTM) network to learn effective video representations and perform multiclass video classification. An attention mechanism is also integrated after BiLSTM to apply attention probability distribution in the network. These models are evaluated on a manually annotated dataset of 111,156 cartoon clips collected from YouTube videos. Experimental results demonstrated that EfficientNet-BiLSTM (accuracy = 95.66%) performs better than attention mechanism-based EfficientNet-BiLSTM (accuracy = 95.30%) framework. Secondly, the traditional machine learning classifiers perform relatively poor than deep learning classifiers. Overall, the architecture of EfficientNet and BiLSTM with 128 hidden units yielded state-of-the-art performance (f1 score = 0.9267). Furthermore, the performance comparison against existing state-of-the-art approaches verified that BiLSTM on top of CNN captures better contextual information of video descriptors in network architecture, and hence achieved better results in child inappropriate video content detection and classification.

Highlights

The creation and consumption of videos on social media platforms have grown drastically over the past few years
This study found that the pretrained convolutional neural network (CNN) model (EfficientNet-B7) based features with support vector machine (SVM) classifier performed better than other techniques
This paper addresses the aforementioned problem by working with an ImagNet pretrained CNN (EfficientNet-B7) and bidirectional long short-term memory (BiLSTM) neural networks

Summary

Introduction

The creation and consumption of videos on social media platforms have grown drastically over the past few years. Billions of hours of videos are available on this video corpus where users of all age groups can explore generic as well as personalized content [2]. Considering such a large-scale crowdsourced database, it is extremely challenging to monitor and regulate the uploaded content as per platform guidelines. This creates opportunities for malicious users to indulge in spamming activities by misleading the audiences with falsely advertised content (i.e., video, audio or text). This trend got people’s attention when mainstream media reported about the Elsagate

Methods

Results

Conclusion