This manuscript introduces an innovative approach to optimizing the distribution of a limited vaccine resource within a population modeled as a contact network, aiming to mitigate the spread of infectious diseases. The study develops a novel methodology that combines reinforcement learning and graph neural networks. To understand the dynamics of disease propagation, the study constructs an analytical model that outlines conditions for disease eradication or endemic states. This model supports a series of simulation experiments across various scenarios, demonstrating the proposed method’s superiority over random and centrality-based approaches in reducing the average number of infections per individual during an outbreak. The adaptability of the proposed method is further emphasized by its robust performance across networks of diverse sizes and configurations, highlighting its real-world applicability. The findings of this study have significant implications for public health policy and resource allocation, offering a promising framework for managing infectious disease outbreaks in complex and dynamic environments.