Abstract

Current Computer Vision algorithms for classifying objects, such as Deep Nets, lack robustness to image changes which, although perceptible, would not fool a human observer. We quantify this by showing how performances of Deep Nets degrades badly on images where the objects are partially occluded and degrades even worse on more challenging and adversarial situations where, for example, patches are introduced in the images to target the weak points of the algorithm. To address this problem we develop a novel architecture, called Compositional Generative Networks (Compositional Nets) which is innately robust to these types of image changes. This architecture replaces the fully connected classification head of the deep network by a generative compositional model which includes an outlier process. This enables it, for example, to localize occluders and subsequently focus on the non-occluded parts of the object. We conduct classification experiments in a variety of situations including artificially occluded images, real images of partially occluded objects from the MS-COCO dataset, and adversarial patch attacks on PASCAL3D+ and the German Traffic Sign Recognition Benchmark. Our results show that Compositional Nets are much more robust to occlusion and adversarial attacks, like patch attacks, compared to standard Deep Nets, even those which use data augmentation and adversarial training. Compositional Nets can also accurately localize these image changes, despite being trained only with class labels. We argue that testing vision algorithms in an adversarial manner which probes for the weakness of the algorithms, e.g., by patch attacks, is a more challenging way to evaluate them compared to standard methods, which simply test them on a random set of samples, and that Compositional Nets have the potential to overcome such challenges.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call