A survey of methods for addressing the challenges of referring image segmentation

Lixia Ji,Yunlong Du,Yiping Dang,Wenzhao Gao,Han Zhang

doi:10.1016/j.neucom.2024.127599

Abstract

Referring image segmentation is guided by natural language descriptions to separate the target objects in an image. This task is different from semantic segmentation and instance segmentation in that it involves unique challenges such as multimodal information fusion, variability of natural language expressions, and model robustness. In recent years, the emergence of deep learning techniques has led to innovative ideas and methods for solving these problems. We systematically analyze the main challenges of referring image segmentation and summarize the existing solutions. These include strategies such as multimodal fusion, expression query, multimodal pre-training, and robustness. In addition, we provide an overview of several datasets commonly used in referring image segmentation and analyze the performance of various representative approaches in comparison to different datasets, visual backbone models and threshold settings. Our focus also extends to the challenges and future developments in the field of referring image segmentation. Our survey paper will provide a comprehensive technical reference for future researchers.

Full Text