Abstract A major aim of gravitational wave astronomy is to test observationally the Kerr nature of black holes. The strongest such test, with minimal additional assumptions, is provided by observations of multiple ringdown modes, also known as black hole spectroscopy. For the gravitational wave merger event GW190521, we have previously claimed the detection of two ringdown modes emitted by the remnant black hole. In this paper we provide further evidence for the detection of multiple ringdown modes from this event. We analyse the recovery of simulated gravitational wave signals designed to replicate the ringdown properties of GW190521. We quantify how often our detection statistic reports strong evidence for a sub-dominant $(\ell,m,n)=(3,3,0)$ ringdown mode, even when no such mode is present in the simulated signal. We find this only occurs with a probability $\sim 0.02$, which is consistent with a Bayes factor of $56 \pm 1$ (1$\sigma$ uncertainty) found for GW190521. We also quantify our agnostic analysis of GW190521, in which no relationship is assumed between ringdown modes, and find that only 1 in 250 simulated signals without a $(3,3,0)$ mode yields a result as significant as GW190521. Conversely, we verify that when simulated signals do have an observable $(3,3,0)$ mode they consistently yield a strong evidence and significant agnostic results. We also find that constraints on deviations from the $(3,3,0)$ mode on GW190521-like signals with a $(3,3,0)$ mode are consistent with what was obtained from our previous analysis of GW190521. Our results support our previous conclusion that the gravitational wave signal from GW190521 contains an observable sub-dominant $(\ell,m,n)=(3,3,0)$ mode.