Abstract

Various types of saliency cues exist, all of which can be instrumental in the foreground extraction. It brings us to an interesting problem of effectively combining them. Note that earlier works either fuse them in the spatial domain or introduce dedicated terms in the energy functions to cater to multiple cues. In contrast, this paper investigates the appearance domain and proposes a novel appearance fusion framework, which we refer to as AppFuse. It is an intuitive framework for fusing candidate appearance models into the desired one for an energy function. Thus, we do not require any alterations in the energy function anymore. Like any fusion strategy, the proposed framework also requires guidance, which we facilitate through reliability and mutual consensus phenomena. To demonstrate the efficacy, we leverage it to solve a foreground extraction problem named video co-localization, where we propose two novel concepts i) hierarchical co-saliency and ii) mask-specific proposals. Our fusion results ensure that similar objects get highlighted sufficiently to ensure localization simply by respecting our framework and different spatiotemporal constraints. Our exhaustive set of experiments using both hand-crafted and learned saliency cues reveal that our approach comfortably outperforms several competing localization methods on standard benchmark datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call