PurposeThis paper aims to implement and extend the You Only Live Once (YOLO) algorithm for detection of objects and activities. The advantage of YOLO is that it only runs a neural network once to detect the objects in an image, which is why it is powerful and fast. Cameras are found at many different crossroads and locations, but video processing of the feed through an object detection algorithm allows determining and tracking what is captured. Video Surveillance has many applications such as Car Tracking and tracking of people related to crime prevention. This paper provides exhaustive comparison between the existing methods and proposed method. Proposed method is found to have highest object detection accuracy.Design/methodology/approachThe goal of this research is to develop a deep learning framework to automate the task of analyzing video footage through object detection in images. This framework processes video feed or image frames from CCTV, webcam or a DroidCam, which allows the camera in a mobile phone to be used as a webcam for a laptop. The object detection algorithm, with its model trained on a large data set of images, is able to load in each image given as an input, process the image and determine the categories of the matching objects that it finds. As a proof of concept, this research demonstrates the algorithm on images of several different objects. This research implements and extends the YOLO algorithm for detection of objects and activities. The advantage of YOLO is that it only runs a neural network once to detect the objects in an image, which is why it is powerful and fast. Cameras are found at many different crossroads and locations, but video processing of the feed through an object detection algorithm allows determining and tracking what is captured. For video surveillance of traffic cameras, this has many applications, such as car tracking and person tracking for crime prevention. In this research, the implemented algorithm with the proposed methodology is compared against several different prior existing methods in literature. The proposed method was found to have the highest object detection accuracy for object detection and activity recognition, better than other existing methods.FindingsThe results indicate that the proposed deep learning–based model can be implemented in real-time for object detection and activity recognition. The added features of car crash detection, fall detection and social distancing detection can be used to implement a real-time video surveillance system that can help save lives and protect people. Such a real-time video surveillance system could be installed at street and traffic cameras and in CCTV systems. When this system would detect a car crash or a fatal human or pedestrian fall with injury, it can be programmed to send automatic messages to the nearest local police, emergency and fire stations. When this system would detect a social distancing violation, it can be programmed to inform the local authorities or sound an alarm with a warning message to alert the public to maintain their distance and avoid spreading their aerosol particles that may cause the spread of viruses, including the COVID-19 virus.Originality/valueThis paper proposes an improved and augmented version of the YOLOv3 model that has been extended to perform activity recognition, such as car crash detection, human fall detection and social distancing detection. The proposed model is based on a deep learning convolutional neural network model used to detect objects in images. The model is trained using the widely used and publicly available Common Objects in Context data set. The proposed model, being an extension of YOLO, can be implemented for real-time object and activity recognition. The proposed model had higher accuracies for both large-scale and all-scale object detection. This proposed model also exceeded all the other previous methods that were compared in extending and augmenting the object detection to activity recognition. The proposed model resulted in the highest accuracy for car crash detection, fall detection and social distancing detection.
Read full abstract