Abstract

Active vision aims to equip computer vision methods with the ability to dynamically adjust the capturing sensor’s viewpoint, position, or parameters in real time. This dynamic capability allows for improving the accuracy of the perception process. However, training and evaluating an active vision model often requires a large number of annotated images captured under different sensor and environmental settings, in order to emulate actions like moving around, approaching, or moving away from a person and thus effectively model the active perception dynamics. Obviously, collecting and annotating such datasets is a challenging and expensive task. To overcome these limitations, this paper introduces a synthetic image generation pipeline specifically designed to support active vision tasks. The pipeline is developed using a highly realistic simulation framework based on Unity and allows for the generation of images depicting humans, captured at varying view angles, distances, illumination conditions, and backgrounds, supporting a wide range of different tasks. Two annotated datasets, namely ActiveHuman and ActiveFace, are generated using the pipeline and the effectiveness of the proposed approach is demonstrated by a solid use case that involves training and evaluating an embedding-based active face recognizer. Furthermore, we demonstrate how the proposed generation approach enables expanding existing active face recognition methods by training models that control both the left/right movements, as well as the distance to a subject, leveraging the additional information provided by ActiveFace dataset. To facilitate replication and encourage the use of the generated datasets for training and evaluating other active vision approaches, the associated assets and the developed dataset generation pipeline is to become publicly available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call