This research introduces a groundbreaking approach to supply chain optimization and management, termed as Deep Reinforcement Learning based Supply Chain Optimization and Management (DRL-SCOM). At the core of this approach is the utilization of advancements in Deep Reinforcement Learning (DRL), specifically through the integration of Randomized Ensembled Double Q-learning (REDQ) and Trust Region Policy Optimization (TRPO). DRL-SCOM is designed to effectively tackle the inherent complexities and dynamic challenges that are characteristic of supply chain management. One of the key strengths of DRL-SCOM lies in its use of REDQ, which plays a crucial role in mitigating the overestimation bias commonly associated with traditional Q-learning methods. This results in more accurate value estimation and policy improvement, a critical factor in the effective management of supply chains. Additionally, the integration of TRPO into the framework brings the advantage of safe and stable policy updates. Such stability is vital for maintaining the robustness required in the fluctuating environment of supply chain operations. The combination of REDQ and TRPO in DRL-SCOM creates a powerful synergy. REDQ’s ensembled learning approach, when fused with TRPO’s trust-region method, enables the framework to efficiently navigate the complex and high-dimensional decision space typical of supply chains. This allows for real-time optimization of decisions while staying within operational constraints. The DRL-SCOM methodology shows significant potential in addressing various aspects of supply chain management, from demand forecasting and inventory management to logistics, adeptly handling the nonlinearities and uncertainties that are prevalent in these areas. Thus, the DRL-SCOM framework emerges as an innovative solution, pushing the frontiers of traditional supply chain management. It paves the way for a more agile, responsive, and intelligent system, equipped to adapt to changing market demands and operational challenges. This approach represents a significant stride towards transforming supply chain management into a more advanced, data-driven, and adaptive field.