In the field of quantum computing, variational quantum algorithms (VQAs) represent a pivotal category of quantum solutions across a broad spectrum of applications. These algorithms demonstrate significant potential for realising quantum computational advantage. A fundamental aspect of VQAs involves formulating expressive and efficient quantum circuits (namely ansatz), and automating the search of such ansatz is known as quantum architecture search (QAS). Recently reinforcement learning (RL) techniques is utilized to automate the search for ansatzes, know as RL-QAS. This study investigates RL-QAS for crafting ansatz tailored to the variational quantum state diagonalisation problem. Our investigation includes a comprehensive analysis of various dimensions, such as the entanglement thresholds of the resultant states, the impact of initial conditions on the performance of RL-agent, the phase transition behaviour of correlation in concurrence bounds, and the discrete contributions of qubits in deducing eigenvalues through conditional entropy metrics. We leverage these insights to devise an entanglement-guided admissible ansatz in QAS to diagonalise random quantum states using optimal resources. Furthermore, the methodologies presented herein offer a generalised framework for constructing reward functions within RL-QAS applicable to variational quantum algorithms.