In recent years, there has been a growing interest in using model-free deep reinforcement learning (DRL)-based controllers as an alternative approach to improve the dynamic behavior, efficiency, and other aspects of DC–DC power electronic converters, which are traditionally controlled based on small signal models. These conventional controllers often fail to self-adapt to various uncertainties and disturbances. This paper presents a design methodology using proximal policy optimization (PPO), a widely recognized and efficient DRL algorithm, to make near-optimal decisions for real buck converters operating in both continuous conduction mode (CCM) and discontinuous conduction mode (DCM) while handling resistive and inductive loads. Challenges associated with delays in real-time systems are identified. Key innovations include a chattering-reduction reward function, engineering of input features, and optimization of neural network architecture, which improve voltage regulation, ensure smoother operation, and optimize the computational cost of the neural network. The experimental and simulation results demonstrate the robustness and efficiency of the controller in real scenarios. The findings are believed to make significant contributions to the application of DRL controllers in real-time scenarios, providing guidelines and a starting point for designing controllers using the same method in this or other power electronic converter topologies.