Abstract
Contrastive learning has remarkable transfer learning capabilities. But many current methods are pretrained based on instance-level or pixel-level pretext tasks, resulting in representations lacking local or global information. This paper proposes contrastive representation learning with mixture-of-instance-and-pixel learning, contrastive mixture of instance and pixel, which can adaptively aggregate the global and local information to better guide representation learning. Specifically, at the instance level, for multiple data augmentations of images, multiple momentum encoders are used for feature extraction to use more negative samples in each iteration process. At the pixel level, feature map multiplication is used to aggregate spatial features. Experiments show that contrastive mixture of instance and pixel can be well transferred to image-level prediction and pixel-level prediction tasks and do not require large training batches. It achieves 70.9 Acc@1, 56.8 AP, 75.4 mIoU, 39.4/34.2 AP on ImageNet linear classification, Pascal Visual Object Classes object detection, Cityscapes semantic segmentation, Microsoft Common Objects in Context object detection, and instance segmentation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Information Technologies and Systems Approach
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.