Abstract

Video pornography and nudity detection aim to detect and classify people in videos into nude or normal for censorship purposes. Recent literature has demonstrated pornography detection utilising the convolutional neural network (CNN) to extract features directly from the whole frames and support vector machine (SVM) to classify the extracted features into two categories. However, existing methods were not able to detect the small-scale content of pornography and nudity in frames with diverse backgrounds. This limitation has led to a high false-negative rate (FNR) and misclassification of nude frames as normal ones. In order to address this matter, this paper explores the limitation of the existing convolutional-only approaches focusing the visual attention of CNN on the expected nude regions inside the frames to reduce the FNR. The You Only Look Once (YOLO) object detector was transferred to the pornography and nudity detection application to detect persons as regions of interest (ROIs), which were applied to CNN and SVM for nude/normal classification. Several experiments were conducted to compare the performance of various CNNs and classifiers using our proposed dataset. It was found that ResNet101 with random forest outperformed other models concerning the F1-score of 90.03% and accuracy of 87.75%. Furthermore, an ablation study was performed to demonstrate the impact of adding the YOLO before the CNN. YOLO–CNN was shown to outperform CNN-only in terms of accuracy, which was increased from 85.5% to 89.5%. Additionally, a new benchmark dataset with challenging content, including various human sizes and backgrounds, was proposed.

Highlights

  • Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.license.Given the vast growth and quantity of videos and images in all types of media nowadays, various content understanding methods have been developed and employed in real-world scenarios

  • If the frame comprises more than one person, each person image patch is passed to convolutional neural network (CNN) to produce one category that was stored in a list

  • The problem associated with nudity detection at various scales and backgrounds was addressed

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. The same approach of transfer learning was used with our proposed challenging nudity dataset to extract features from the images. CNN-only methods have shown superior performance for pornography detection when the pornographic content covers the frames largely, their performance is still limited in detecting nudity when the nude or porn persons cover only small regions inside the frame and when the background is complex, such as nude people in a forest, snow, beach, supermarket, indoor, and streets. The proposed method detects nudity regions existing in different scales inside the frames with complex backgrounds.

Datasets Overview
ImageNet Dataset
NDPI Dataset
Testing Film Dataset
Methodology
YOLO-Based Human Detection
CNN-Based Feature Extraction
Various CNN Architectures
Classification
InPick the first
Method
Experimental Setup and Results
Performance Metrics
F1 score: this metric summarizes recall and precision in one term
The First Experiment
The Second Experiment
12. Confusion
The Third Experiment
Class Activation Mapping
Findings
Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.