Abstract
Due to overexposure, jitter, motion, and other spatiotemporal-varying perturbations, the collected images always undergo various visual distortions (e.g., deformation, partially occluded signs, fisheye respective, affine or 3D projections, in-plane and out-of-plane rotation) during acquisition or transmission procedure. Deep neural networks (DNNs) perform poorly on such pristine images in terms of high-level abstract operations, e.g., object categorization and semantic segmentation. To conquer this legacy, a distortion-tolerant model denoted as CapsNetSIFT is proposed to enhance representability and detectability of target in distorted imagery. We modify and integrate capsule network (CapsNet) with scale invariant feature transform (SIFT) together, both of which boast innate invariance to spacial-scale transformations. Two key insights, the customized multi-dimensional CapsNet (MD-CapsNet) and vector matching SIFT (VM-SIFT), can cooperate together and reinforce each other: the former encodes and provides representative feature vectors for the later, whilst the later localizes space-scale invariant interval dimensions (instead of pixels) and establish correspondence between source standard images (high-quality training images) and distorted ones (testing images). Thus, the category of one source standard image owning the most associations is the ground-truth category. Evaluation results reveal that employing CapsNetSIFT for distorted target recognition (CUB-200–2011, Stanford Dogs, Stanford Cars, and our hand-crafted dataset), significantly improves the resistance against various simulated distortions, and outperforms state-of-the-arts with relatively higher training and testing accuracy (93.97% and 91.03%).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have