In the current age of technological progress, surveillance cameras are extensively placed in densely populated areas to encourage public safety. However, detecting stealing activities within surveillance footage remains a challenge. The global issue of stealing leads to significant casualties and financial losses annually. There has been limited research on efficient techniques to identify this anomaly. In this manuscript, an efficient multi-scale attention-based convolutional neural network with skill optimization algorithm (MSAC-SOA) is proposed for detecting theft during day and night in surveillance videos. This approach integrates with the enhanced single-shot multibook detector (ESSD) for object detection in video footage, facilitating the precise identification and localization of objects within each frame. Subsequently, the proposed MSAC-SOA effectively detects and classifies instances of stealing. Finally, a theft detection alert message is sent to the control room. This approach is assessed with two datasets: University of Central Florida (UCF) crime dataset for daytime theft detection and real-time video data for nighttime theft detection. The proposed method demonstrates faster computation times compared to previous techniques, highlighting its efficiency and suitability in terms of various merits for real-life videos of diverse scenarios. It exhibits robust performance across various metrics, including 99.88% accuracy, 99.66% precision, 100% recall, 99.89% F1-score, 99.83% area under the ROC curve (AUC), and 99.83% specificity.