The energy harvesting capability of a graded metamaterial is maximised via reinforcement learning (RL) under realistic excitations at the microscale. The metamaterial consists of a waveguide with a set of beam-like resonators of variable length, with piezoelectric patches, attached to it. The piezo-mechanical system is modelled through equivalent lumped parameters determined via a general impedance analysis. Realistic conditions are mimicked by considering either magnetic loading or random excitations, the latter scenario requiring the enhancement of the harvesting capability for a class of forcing terms with similar but different frequency content. The RL-based optimisation is empowered by using the physical understanding of wave propagation in a such local resonance system to constrain the state representation and the action space. The procedure outcomes are compared against grading rules optimised through genetic algorithms. While genetic algorithms are more effective in the deterministic setting featuring the application of magnetic loading, the proposed RL-based proves superior in the inherently stochastic setting of the random excitation scenario.