Responsible AI for Automated Analysis of Integrated Video Surveillance in Public Spaces

Nehemia Sugianto

doi:10.25904/1912/4371

Abstract

Understanding customer experience in real-time can potentially support people’s safety and comfort while in public spaces. Existing techniques, such as surveys and interviews, can only analyse data at specific times. Therefore, organisations that manage public spaces, such as local government or business entities, cannot respond immediately when urgent actions are needed. Manual monitoring through surveillance cameras can enable organisation personnel to observe people. However, fatigue and human distraction during constant observation cannot ensure reliable and timely analysis. Artificial intelligence (AI) can automate people observation and analyse their movement and any related properties in real-time. Analysing people’s facial expressions can provide insight into how comfortable they are in a certain area, while analysing crowd density can inform us of the area’s safety level. By observing the long-term patterns of crowd density, movement, and spatial data, the organisation can also gain insight to develop better strategies for improving people’s safety and comfort. There are three challenges to making an AI-enabled video surveillance system work well in public spaces. First is the readiness of AI models to be deployed in public space settings. Existing AI models are designed to work in generic/particular settings and will suffer performance degradation when deployed in a real-world setting. Therefore, the models require further development to tailor them for the specific environment of the targeted deployment setting. Second is the inclusion of AI continual learning capability to adapt the models to the environment. AI continual learning aims to learn from new data collected from cameras to adapt the models to constant visual changes introduced in the setting. Existing continuous learning approaches require long-term data retention and past data, which then raise data privacy issues. Third, most of the existing AI-enabled surveillance systems rely on centralised processing, meaning data are transmitted to a central/cloud machine for video analysis purposes. Such an approach involves data privacy and security risks. Serious data threats, such as data theft, eavesdropping or cyberattack, can potentially occur during data transmission. This study aims to develop an AI-enabled intelligent video surveillance system based on deep learning techniques for public spaces established on responsible AI principles. This study formulates three responsible AI criteria, which become the guidelines to design, develop, and evaluate the system. Based on the criteria, a framework is constructed to scale up the system over time to be readily deployed in a specific real-world environment while respecting people’s privacy. The framework incorporates three AI learning approaches to iteratively refine the AI models within the ethical use of data. First is the AI knowledge transfer approach to adapt existing AI models from generic deployment to specific real-world deployment with limited surveillance datasets. Second is the AI continuous learning approach to continuously adapt AI models to visual changes introduced by the environment without long-period data retention and the need for past data. Third is the AI federated learning approach to limit sensitive and identifiable data transmission by performing computation locally on edge devices rather than transmitting to the central machine. This thesis contributes to the study of responsible AI specifically in the video surveillance context from both technical and non-technical perspectives. It uses three use cases at an international airport as the application context to understand passenger experience in real-time to ensure people’s safety and comfort. A new video surveillance system is developed based on the framework to provide automated people observation in the application context. Based on real deployment using the airport’s selected cameras, the evaluation demonstrates that the system can provide real-time automated video analysis for three use cases while respecting people’s privacy. Based on comprehensive experiments, AI knowledge transfer can be an effective way to address limited surveillance datasets issue by transferring knowledge from similar datasets rather than training from scratch on surveillance datasets. It can be further improved by incrementally transferring knowledge from multi-datasets with smaller gaps rather than a one-stage process. Learning without Forgetting is a viable approach for AI continuous learning in the video surveillance context. It consistently outperforms fine-tuning and joint-training approaches with lower data retention and without the need for past data. AI federated learning can be a feasible solution to allow continuous learning in the video surveillance context without compromising model accuracy. It can obtain comparable accuracy with quicker training time compared to joint-training.

Full Text