Federated learning presents a compelling approach to training artificial intelligence systems in decentralized settings, prioritizing data safety over traditional centralized training methods. Understanding correlations among higher-level threats exhibiting abnormal behavior in the data stream becomes paramount to developing cyber–physical systems resilient to diverse attacks within a continuous data exchange framework. This work introduces a novel vertical federated multi-agent learning framework to address the challenges of modeling attacker and defender agents in stationary and non-stationary vertical federated learning environments. Our approach uniquely applies synchronous Deep Q-Network (DQN) based agents in stationary environments, facilitating convergence towards optimal strategies. Conversely, in non-stationary contexts, we employ synchronous Advantage Actor–Critic (A2C) based agents, adapting to the dynamic nature of multi-agent vertical federated reinforcement learning (VFRL) environments. This methodology enables us to simulate and analyze the adversarial interplay between attacker and defender agents, ensuring robust policy development. Our exhaustive analysis demonstrates the effectiveness of our approach, showcasing its capability to learn optimal policies in both static and dynamic setups, thus significantly advancing the field of cyber-security in federated learning contexts. To evaluate the effectiveness of our approach, we have done a comparative analysis with its baseline schemes. The findings of our study show significant enhancements compared to the standard methods, confirming the efficacy of our methodology. This progress dramatically enhances the area of cyber-security in the context of federated learning by facilitating the formulation of substantial policies. The proposed scheme attains 15.93%, 32.91%, 31.02%, and 47.26% higher results as compared to the A3C, DDQN, DQN, and Reinforce, respectively.