The nuclear sector represents the largest single source of carbon-free energy in the United States. Nevertheless, keeping existing nuclear assets competitive on the grid is key to their ability to continue to provide dispatchable clean energy along side renewable generation. Optimizing the fuel cycle cost through the optimization of core loading patterns is one approach to addressing this lack of competitiveness. However, this optimization task involves multiple objectives and constraints, resulting in a vast number of candidate solutions that cannot be explicitly solved. While stochastic optimization (SO) methodologies are utilized by various nuclear utilities and vendors for fuel cycle reload design, manual design remains the preferred approach. To advance the state-of-the-art in core reload patterns, we have developed methods based on Deep Reinforcement Learning for single and multi-objective optimization. Previous research has laid the groundwork for these approaches and demonstrated its ability to discover high-quality patterns within a reasonable timeframe. Moreover, RL methods have shown superiority against SO in the multi-objective settings. In this paper, we introduce an innovative paradigm by incorporating multi-objective RL, interpretable AI, and physics knowledge to further improve the performance of the algorithms. When the problem increases in complexity, the classical single-objective sometimes failed to even found feasible solutions. Leveraging physic-information becomes pivotal to find high-quality and more diverse solutions faster. These findings motivate incorporation of more physics to further aid in the quest of surpassing human intelligence. Future work will focus on scaling this method and addressing fuel performance and control rod limits to demonstrate that these new designs can be considered for a real reactor.