How to relate synchronized static videos between each other in an active learning set-up?

Thierry Malon,Sylvie Chambon,Alain Crouzil,Vincent Charvillat

doi:10.1016/j.eswa.2022.119344

Abstract

In the context of investigations, understanding relationships between videos, such as determining when fields of view are overlapping, can be useful to efficiently navigate through videos and find hints. Manually finding such links requires checking whether the backgrounds are similar and ensuring that the same objects appear in different videos over time which is very time consuming. Fully automating such a process is a hard task as human knowledge is often required to recognize places or objects over different viewpoints. In this context, we present an active learning approach to make human intelligence and the machine collaborate. More precisely, we introduce a video descriptor based on object category detection. Specifically, in order to characterize the video dynamic parts, this descriptor is constructed by concatenating the proportions of the different object categories that have been detected over time. Moreover, a semantic background video descriptor is proposed to characterize the static parts. Then, these two descriptors are used in an algorithm for finding overlapping links between videos. Regarding the similarity values, the algorithm can either automatically classify the videos as having overlapping fields of view, not having overlapping fields of view, or needing the expert knowledge. In this case, parameters of the proposed algorithm are updated in such a way that the algorithm decides more and more by itself if the answer matches its expectation, and that the algorithm more often predicts the other class if the answer does not match its expectation. The efficiency of the whole proposed approach is validated by an evaluation and a comparison with other active learning set-ups on a set of 63 real videos from public multi-view datasets.

Full Text