The mechanisms by which individuals visually perceive and select others within a group to interact with are of fundamental importance to various collective motion behaviors. However, they remain poorly understood theoretically, partially due to the challenges of modeling neurological perception systems. Here, the classic zoom-lens visual attention model is introduced into collective perception, leading to an attention-based model with only a single parameter. Numerical experiments reveal that the proposed model offers a unified mechanism for collective perception, as reflected in three aspects. First, this model describes both conspecific (e.g. flocking) and heterospecific (e.g. predator-prey) collective behaviors. Second, this model unifies the well-known topological and visibility models in the context of neighbor selection in conspecific groups. Third, in the context of prey selection in heterospecific groups, the model can simultaneously replicate the well-known confusion and oddity effects. These findings demonstrate the fundamental role of visual attention underlying a diverse array of collective motion behaviors.