The rapid growth in autonomous industrial environments has increased the need for intelligent video surveillance. As a predominant element of video surveillance, recognition of complex human movements is important in a wide range of surveillance applications. However, the current state-of-the-art video surveillance techniques use supervised deep learning pipelines for human activity recognition (HAR). A key shortcoming of such techniques is the inability to learn from unlabeled video streams. To operate effectively in natural environments, video surveillance techniques have to be able to handle huge volumes of unlabeled video data, monitor and generate alerts and insights derived from multiple characteristics such as spatial structure, motion flow, color distribution, etc. Furthermore, most conventional learning systems lack memory persistence capability which can reduce the influence of outdated information in memory-guided decision-making resulting in limiting plasticity and overfitting based on specific past events. In this article, we propose a new adaptation of the Growing Self-Organizing Map (GSOM) to address these shortcomings by 1) adopting two proven concepts of traditional deep learning, hierarchical, and multistream learning, applied into GSOM self-structuring architecture to accommodate learning from unlabeled video data and their diverse characteristics, 2) address overfitting and the influence of outdated information on neural architecture by implementing a transience property in the algorithm. We demonstrate the proposed model using three benchmark video datasets and the results confirm its validity and usability for HAR.
Read full abstract