Abstract
Smart video surveillance helps to build more robust smart city environment. The varied angle cameras act as smart sensors and collect visual data from smart city environment and transmit it for further visual analysis. The transmitted visual data is required to be in high quality for efficient analysis which is a challenging task while transmitting videos on low capacity bandwidth communication channels. In latest smart surveillance cameras, high quality of video transmission is maintained through various video encoding techniques such as high efficiency video coding. However, these video coding techniques still provide limited capabilities and the demand of high-quality based encoding for salient regions such as pedestrians, vehicles, cyclist/motorcyclist and road in video surveillance systems is still not met. This work is a contribution towards building an efficient salient region-based surveillance framework for smart cities. The proposed framework integrates a deep learning-based video surveillance technique that extracts salient regions from a video frame without information loss, and then encodes it in reduced size. We have applied this approach in diverse case studies environments of smart city to test the applicability of the framework. The successful result in terms of bitrate 56.92%, peak signal to noise ratio 5.35 bd and SR based segmentation accuracy of 92% and 96% for two different benchmark datasets is the outcome of proposed work. Consequently, the generation of less computational region-based video data makes it adaptable to improve surveillance solution in Smart Cities.
Highlights
Video surveillance is a key element for groundwork of a balanced smart city
We have proposed a surveillance framework Efficient Shallow Segmentation based Encoding (ESSE), that integrates deep learning based salient-region (S-R) extraction and efficient video encoding
The proposed Shallow Semantic Segmentation Network (S-SSN) achieved pixel-level validation accuracy of 96% and 92% for CamVid and Mapillary vistas datasets respectively
Summary
Video surveillance is a key element for groundwork of a balanced smart city. Video data from all-over the smart city is collected from cameras acting as visual sensors. Another state of the art approach is proposed in [45] presented a deep learning based traffic video compression in which they are extracting region of interest by DL based localization method They achieved prominent results for peak signal to noise ratio (PSNR) they achieved higher bit-rate because of localization method which unnecessary extra region along with S-Rs. The main problem for high-efficiency smart city video surveillance encoding is to (1) extract salient-regions efficiently with low inference time. This paper presents an Efficient Shallow Segmentation based Encoding (ESSE) framework which ensures the high quality of salient regions in smart city surveillance video by reducing its size as well. It helps to identify suspected person/vehicle, detect traffic and roads.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have