Abstract
In complex stimuli, there are many different possible ways to refer to a specified target. Previous studies have shown that when people are faced with such a task, the content of their referring expression reflects visual properties such as size, salience, and clutter. Here, we extend these findings and present evidence that (i) the influence of visual perception on sentence construction goes beyond content selection and in part determines the order in which different objects are mentioned and (ii) order of mention influences comprehension. Study 1 (a corpus study of reference productions) shows that when a speaker uses a relational description to mention a salient object, that object is treated as being in the common ground and is more likely to be mentioned first. Study 2 (a visual search study) asks participants to listen to referring expressions and find the specified target; in keeping with the above result, we find that search for easy-to-find targets is faster when the target is mentioned first, while search for harder-to-find targets is facilitated by mentioning the target later, after a landmark in a relational description. Our findings show that seemingly low-level and disparate mental “modules” like perception and sentence planning interact at a high level and in task-dependent ways.
Highlights
When referring to an entity in a visual scene, speakers often describe it relative to some nearby landmark: “the woman next to the stairs.” Previous research demonstrates that speakers choose these landmarks with reference to the visual properties of the scene, and in particular that they prefer those that are larger and easier to see (Kelleher et al, 2005; Duckham et al, 2010; Clarke et al, 2013)
Systems for automatic referring expression generation (REG) have given little attention to ordering in the past, our results suggest that the use of perceptual data may lead to both more human-like references and better performance
We examine the strategies chosen for all pairs consisting of a target and non-image-region landmark
Summary
When referring to an entity (the target) in a visual scene, speakers often describe it relative to some nearby landmark: “the woman next to the stairs.” Previous research demonstrates that speakers choose these landmarks with reference to the visual properties of the scene, and in particular that they prefer those that are larger and easier to see (Kelleher et al, 2005; Duckham et al, 2010; Clarke et al, 2013). Previous research demonstrates that speakers choose these landmarks with reference to the visual properties of the scene, and in particular that they prefer those that are larger and easier to see (Kelleher et al, 2005; Duckham et al, 2010; Clarke et al, 2013). Alternative orders are available (“next to the stairs is a woman”), most existing models of reference do not address the production format question: how speakers choose to package the content of a referring expression when it includes both a target and one or more disambiguating landmarks. The production and comprehension results indicate that dialogue participants’ perceptions of the scene have far-reaching effects on both referring expression generation (REG) and understanding. Visual perception is not confined to providing inputs to a content selection mechanism, as in many popular models, and contributes toward high-level decisions about the expression’s structure
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.