Object change detection (OCD), which aims to segment moving objects from an input frame, has attracted growing attention. Most OCD algorithms rely on scene diversity or ignore the interframe spatiotemporal structural dependence, which limits their applicability. In this paper, we propose a motion-appearance-aware network (MAAN) for learning robust feature representations. Specifically, a module for mining information at multiple time scales, which can adaptively adjust information elements, is designed to refine the motion feature. Meanwhile, salient object knowledge is obtained with the help of the extracted appearance features. To enhance the semantic consistency and trim redundant connections, we construct a fusion module called multi-view feature evolution, which effectively fuses motion and appearance information by global communication and local guidance, respectively. Moreover, we develop two strategies for obtaining uniform and consistent moving objects during information propagation. One is to feed the predicted mask of the previous frame into the decoder, and the other is to match different levels of motion cues at multiple time scales to the decoder. Finally, extensive experiments on four public datasets (i.e., LASIESTA, CDnet2014, INO, and AICD) indicate that the proposed approach outperforms other methods.
Read full abstract