As parents, we all want to protect our children from the dangers of online pornography, cyberbullying, and predators. Various current methods focus on limited information gathered from the child's interactions online. Some use a blacklist of prohibited URLs to block access to certain websites, while others analyze the multimedia exchanged between the child and others. However, these approaches may not be foolproof as new URLs can evade the blacklist and individual images, videos, and text may appear harmless when considered on their own. Our proposal suggests a flexible framework that examines material at the Human-Computer Interaction (HCI) level, or user interface, in its completed state. Despite hardware restrictions, Children's Agents for Secure and Privacy Enhanced Reaction (CASPER) seeks to analyze audio signals and screen captures in real-time to make judgments based on all available information. Through the use of deep learning methods for text, audio, and picture processing, CASPER categorizes visual content as either pornographic or neutral. Text can be classified as cyberbullying, neutral, or objectionable. We have created a custom dataset with a variety of offensive material for training and assessment purposes. When it comes to text classification, CASPER has demonstrated an average accuracy of 88% and score of 0.85, while its accuracy for pornographic classification is 95%.