There are now billions of videos on YouTube, with most of the viewers being young people. The growth in the quantity of videos has been exponential. To add insult to injury, some unscrupulous users post unsettling animated cartoon films and other material that parents should not allow their children to see. Therefore, social media sites should have a way to filter videos automatically in real time. In order to identify and categorize objectionable video material, this research proposes a new architecture based on deep learning. To do this, the suggested system employs a convolutional neural network (CNN) model named EfficientNet-B7 that has been pre-trained on ImageNet to extract video descriptors. A bidirectional long short-term memory (BiLSTM) network is trained to classify videos into many categories using these characteristics. In order to distribute attention probabilities throughout the network, an attention mechanism is also included after BiLSTM. We test these models using a dataset that has 111,156 cartoon clips tagged by humans from YouTube videos. The experimental findings showed that EfficientNet-BiLSTM achieved a higher accuracy (D=95.30%) than the EfficientNet-BiLSTM framework that was based on the attention mechanism. To continue, when compared to more conventional machine learning classifiers, deep learning ones outperform the pack. With an overall f1 score of D 0.9267, the combination of EfficientNet and BiLSTM, which includes 128 hidden units, achieved state-of-the-art performance. Furthermore, when stacked on top of CNN, BiLSTM outperforms state-of-the-art methods in detecting and classifying children-unsuitable video content because it better grasps the contextual information of video descriptors in network architecture.