Inappropriate visual content on the internet has spread everywhere, and thus children are exposed unintentionally to sexually explicit visual content. Animated cartoon movies sometimes have sensitive content such as pornography and sex. Usually, video sharing platforms take children’s e-safety into consideration through manual censorship, which is both time-consuming and expensive. Therefore, automated cartoon censorship is highly recommended to be integrated into media platforms. In this paper, various methods and approaches were explored to detect inappropriate visual content in cartoon animation. First, state-of-the-art conventional feature techniques were utilised and evaluated. In addition, a simple end-to-end convolutional neural network (CNN) was used and was found to outperform conventional techniques in terms of accuracy (85.33%) and F1 score (83.46%). Additionally, to target the deeper version of CNNs, ResNet, and EfficientNet were demonstrated and compared. The CNN-based extracted features were mapped into two classes: normal and porn. To improve the model’s performance, we utilised feature and decision fusion approaches which were found to outperform state-of-the-art techniques in terms of accuracy (87.87%), F1 score (87.87%), and AUC (94.40%). To validate the domain generalisation performance of the proposed methods, CNNs, pre-trained on the cartoon dataset were evaluated on public NPDI-800 natural videos and found to provide an accuracy of 79.92%, and F1 score of 80.58%. Similarly, CNNs, pre-trained on the public NPDI-800 natural videos, were evaluated on cartoon dataset and found to give an accuracy of 82.666%, and F1 score of 81.588%. Finally, a novel cartoon pornography dataset, with various characters, skin colours, positions, viewpoints, and scales, was proposed.