The search for global minimum (GM) configurations in nanoclusters is complicated by intricate potential energy landscapes replete with numerous local minima. The complexity of these landscapes escalates with increasing cluster size and compositional diversity. Evolutionary algorithms, such as genetic algorithms, are hampered by slow convergence rates and a propensity for prematurely settling on suboptimal solutions. Likewise, the basin hopping technique faces difficulties in navigating these complex landscapes effectively, particularly at larger scales. These challenges highlight the need for more sophisticated methodologies to efficiently scan the potential energy surfaces of nanoclusters. In response, our research has developed a novel deep reinforcement learning (DRL) framework specifically designed to explore the potential energy surfaces (PES) of nanoclusters, aiming to identify the GM configurations along with other low-energy states. This study demonstrates the framework's effectiveness in managing various nanocluster types, including both mono- and multimetallic compositions, and its proficiency in navigating complex energy landscapes. The model is characterized by remarkable adaptability and sustained efficiency, even as cluster sizes and feature vector dimensions increase. The demonstrated adaptability of DRL in this context underscores its considerable potential in materials science, particularly for the efficient discovery and optimization of novel nanomaterials. To the best of our knowledge, this is the first DRL framework designed for the GM search in nanoclusters, representing a significant innovation in the field.