In a matter of mere milliseconds, conversational partners can transform their expectations about the world in a way that accords with another person's perspective. At the same time, in similar situations, the exact opposite also appears to be true. Rather than being at odds, these findings suggest that there are multiple contextual and processing constraints that may guide when and how people consider perspective. These constraints are shaped by a host of factors, including the availability of social and environmental cues, and intrinsic biases and cognitive abilities. To explain how these might be integrated in a new way forward, we turn to an adaptive account of interpersonal interaction. This account draws from basic principles of dynamical systems, principles that we argue are already expressed, both implicitly and explicitly, within a broad landscape of existing research. We then showcase an initial attempt to develop a computational framework to instantiate some of these principles. This framework, consisting of what we argue to be important mechanistic insights rendered by neural network models, is based on a promising and long-standing approach that has yet to take hold in the current domain. We argue that by bridging this gap, new insights into other theoretical accounts, such as the connections between memory and common ground information, might be revealed.