Abstract

Referring Expression Generation (REG) is to generate unambiguous descriptions for the referred object in contexts such as images. While people often use installment dialoguing methods to extend the original basic noun phrases to form final references to objects. Most existing REG models generate Referring Expressions (REs) in a “one-shot” way, which cannot benefit from the interaction process. In this paper, we propose to model REG basing on dialogues. To achieve it, we first introduce a RE-oriented visual dialogue (VD) task ReferWhat?!, then build two large-scale datasets RefCOCOVD and RefCOCO+VD for this task by making use of the existing RE datasets RefCOCO and RefCOCO+ respectively. We finally propose a VD-based REG model. Experimental results show that our model outperforms all the existing “one-shot” REG models. Our ablation studies also show that modeling REG as a dialogue agent can utilize the information in responses from dialogues to achieve better performance which is not available in the “one-shot” models. The source code and datasets will be seen in https://github.com/llxuan/ReferWhat soon.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.