During visual processing, input that is continuous in space and time is segmented, resulting in the representation of discrete tokens-objects or events. And there has been a great deal of research about how object representations are generalized into types-as when we see an object as an instance of a broader category (e.g., an animal or plant). There has been much less attention, however, to the possibility that vision represents dynamic information in terms of a small number of primitive event types (such as twisting or bouncing). (In models that posit a "language of vision," these would be the foundational visual verbs.) Here we ask whether such event types are extracted spontaneously during visual perception, even when entirely task irrelevant during passive viewing. We exploited the phenomenon of categorical perception-wherein differences are more readily noticed when they are represented in terms of different underlying categories. Observers were better at detecting changes to images or short videos when the changes involved switches in the underlying event type-even when the changes that maintained the same event type were objectively larger (in terms of both brute image metrics and higher level feature change). We observed this categorical "cross-event-type" advantage for visual working memory for twisting versus rotating, scooping versus pouring, and rolling versus bouncing. Moreover, additional control experiments confirmed that such effects could not be explained by appeal to lower-level non-categorical stimulus differences. This spontaneous perception of "visual verbs" might promote both generalization and prediction about how events are likely to unfold. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Read full abstract