AbstractThe ability to predict the trajectory of an autonomous vehicle accurately is crucial for safe and efficient navigation. However, predicting diverse and multimodal futures can be challenging. Recent approaches such as attention and graph neural networks have achieved state‐of‐the‐art performance by considering agent interactions and map contexts. This study focused on multi‐agent prediction using an agent‐centric approach with transformers. This enables parallel computation and a comprehensive understanding of the environment. Two main features are introduced: an adaptive receptive field (ARF) that captures the relevant surroundings for each agent, and perception encoding, which serves as spatial context embeddings. The ARF adapts to the agent's velocity and rotation, focusing attention ahead at high speeds or to the sides when it is slower. Perception encoding divides agents or lanes into levels and encodes the information of each level. This approach enables the efficient encoding of complex spatial relationships. The proposed method combines these advances with transformer modelling for multi‐agent trajectory prediction while ensuring real‐time prediction capabilities. The approach is evaluated on the Argoverse benchmark and better performance than the state‐of‐the‐art baseline is achieved. By addressing challenges such as multimodal outputs and robustness, the study enhances the safety and efficiency of autonomous driving systems by more accurately predicting trajectories.