Abstract

In the field of remote sensing (RS) image semantic segmentation, existing supervised learning methods rely on a large amount of labeled data, which limits their application scope. To solve this problem, we propose a new self-supervised learning method, called multi-scale fusion pixel and instance contrastive learning network (MPINet). This method first uses focal frequency loss to optimize the learning of high-level semantic information, and then strengthens the spatial information in the shallow feature map through multi-scale fusion pixel contrast learning, thereby improving the model’s ability to mine detailed features. Experiments are conducted on the International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam, LoveDA and UAVid datasets. The results show that our method achieves 49.89%, 70.98% and 63.55% in mIoU, OA and mF1 indicators on the ISPRS Potsdam dataset, which is 1.48%, 1.71% and 0.95% higher than the best method. On the LoveDA dataset, our method achieves 38.05%, 52.89% and 54.10% in mIoU, OA and mF1 indicators, which is 0.6%, 0.91% and 1.04% higher than the best method. On the UAVid dataset, our method achieved mIoU, OA, and mF1 scores of 57.92%, 76.97%, and 73.28%, respectively, representing improvements of 1.06%, 1.46%, and 1.15% compared to the best method. Experimental results show that the proposed method outperforms the existing optimal self-supervised learning method and the ImageNet pre-training method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.