Wildlife recognition in nature documentaries with weak supervision from subtitles and external data

Aparna Nurani Venkitasubramanian,Tinne Tuytelaars,Marie-Francine Moens

doi:10.1016/j.patrec.2016.01.025

Abstract

We propose a weakly supervised framework for domain adaptation in a multi-modal context for multi-label classification. This framework is applied to annotate objects such as animals in a target video with subtitles, in the absence of visual demarcators. We start from classifiers trained on external data (the source, in our setting – ImageNet), and iteratively adapt them to the target dataset using textual cues from the subtitles. Experiments on a challenging dataset of wildlife documentaries validate the framework, with a final F1 measure of approximately 70%, which significantly improves over the results of a state-of-the-art approach, that is, applying classifiers trained on ImageNet without adaptation. The methods proposed here take us a step closer to object recognition in the wild and automatic video indexing.

Full Text