Abstract

Visual patterns are basic elements in images and represent the discernible regularity in the visual world. Thus, mining visual patterns is a fundamental task in computer vision. Most previous studies consider that only one visual pattern exists in a category, and then builds up a one-to-one mapping using category label. In reality, however, many categories include multiple patterns, which are many-to-one mappings. Without knowing the information of patterns, few existing pattern mining methods can discover and distinguish varied patterns in a category. To tackle this problem, we propose a novel framework, PaclMap, which learns medium-grained features to represent patterns. It includes an unsupervised pattern-level contrastive learning and a multipattern activation map. Their joint optimization encourages the network to mine both discriminative and frequent patterns in a category. Extensive experiments conducted on four benchmark datasets (Place-20, imagenet large scale visual recognition challenge (ILSVRC)-20, visual object classes (VOC), and Travel) demonstrate that PaclMap outperforms six state-of-the-art methods with average improvements of 2.9% on accuracy and 12.3% on frequency, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call