Change detection in moving-camera videos with limited samples using twin-CNN features and learnable morphological operations

Rafael Padilla,Allan F Da Silva,Eduardo A.B Da Silva,Sergio L Netto

doi:10.1016/j.image.2023.116969

Abstract

This paper presents a new system to detect changes between reference and target videos suitable for small-scale datasets. Twin pre-trained ResNet-50 features are processed using a learning-based pipeline that has a limited number of adjustable parameters, allowing end-to-end training even on relatively small databases. This is achieved with two innovative modules in tandem: a low-complexity dissimilarity module and a post-processing step using learnable morphological operations. Both can be smoothly incorporated in optimization procedures that employ gradient-based algorithms. The pipeline ends with temporal consistency and change classification modules, and it is evaluated on the VDAO dataset, a challenging database of videos recorded with moving cameras in a cluttered industrial environment. Ablation studies show how each proposed module contributes to the final system performance, with a prominence role for the newly proposed ones. Results indicate that the proposed system achieves a detection performance that is about 18% superior to the one of current state-of-the-art methods. Software, results, and a pre-trained architecture of the proposed framework are available at https://github.com/rafaelpadilla/TCF-LMO.

Full Text