The implementation of BESS (battery energy storage systems) and the efficient optimization of their scheduling are crucial research challenges in effectively managing the intermittency and volatility of solar-PV (photovoltaic) systems. Nevertheless, an examination of the existing body of knowledge uncovers notable deficiencies in the ideal arrangement of energy systems' timetables. Most models primarily concentrate on a single aim, whereas only a few tackle the intricacies of multi-objective scenarios. This study examines homes connected to the power grid equipped with a BESS and a solar PV system. It leverages four distinct reinforcement learning (RL) algorithms, selected for their unique training methodologies, to develop effective scheduling models. The findings demonstrate that the RL model using Trust Region Policy Optimization (TRPO) effectively manages the BESS and PV system despite real-world uncertainties. This case study confirms the suitability and effectiveness of this approach. The TRPO-based RL framework surpasses previous models in decision-making by choosing the most optimal BESS scheduling strategies. The TRPO model exhibited the highest mean self-sufficiency rates compared to the A3C (Asynchronous Advantage Actor-Critic), DDPG (Deep Deterministic Policy Gradient), and TAC (Twin Actor Cretic) models, surpassing them by ∼ 3 %, 0.72 %, and 3.5 %, correspondingly. This results in enhanced autonomy and economic benefits by adapting to dynamic real-world conditions. Consequently, our approach was strategically designed to deliver an optimized outcome. This framework is primarily intended for seamless integration into an automated energy plant environment, facilitating regular electricity trading among multiple buildings. Backed by initiatives like the Renewable Energy Certificate weight, this technology is expected to play a crucial role in maintaining a balance between power generation and consumption. The MILP (Mixed Integer Linear Programming) architecture achieved a self-sufficiency rate of 29.12 %, surpassing the rates of A3C, TRPO, DDPG, and TAC by 2.48 %, 0.64 %, 2 %, and 3.04 %, correspondingly.