Human vision supports social perception by efficiently detecting agents and extracting rich information about their actions, goals, and intentions. Here, we explore the cognitive architecture of perceived animacy by constructing Bayesian models that integrate domain-specific hypotheses of social agency with domain-general cognitive constraints on sensory, memory, and attentional processing. Our model posits that perceived animacy combines a bottom-up, feature-based, parallel search for goal-directed movements with a top-down selection process for intent inference. The interaction of these architecturally distinct processes makes perceived animacy fast, flexible, and yet cognitively efficient. In the context of chasing, in which a predator (the "wolf") pursues a prey (the "sheep"), our model addresses the computational challenge of identifying target agents among varying numbers of distractor objects, despite a quadratic increase in the number of possible interactions as more objects appear in a scene. By comparing modeling results with human psychophysics in several studies, we show that the effectiveness and efficiency of human perceived animacy can be explained by a Bayesian ideal observer model with realistic cognitive constraints. These results provide an understanding of perceived animacy at the algorithmic level-how it is achieved by cognitive mechanisms such as attention and working memory, and how it can be integrated with higher-level reasoning about social agency.