The European research project ARTEM (Aircraft noise Reduction Technologies and related Environmental iMpact) develops innovative aircraft noise reduction technologies such as advanced engine fan lining, metamaterials and low-noise high-lift systems applied to a vehicle with enhanced shielding of the engine noise, namely, a blended wing body. Using aircraft flyover auralisation in laboratory listening experiments, such future technologies can be evaluated with respect to human sound perception. To assess the reliability of such perception-based evaluations, the simulation chain should be validated with existing aircraft flyovers. This contribution presents a systematic and rigorous hierarchical validation of auralisations of current jet aircraft using field recordings. Uncertainty in the source modelling is considered by using two different prediction tools for partial sound sources. In addition to comparing computed noise indicators, a psychoacoustic validation is done in laboratory listening experiments with a 3D loudspeaker array. The validation comprises three levels: (i) direct comparison of auralisations with recordings to study the identifiability of auralisations, (ii) ranking of auralisations and recordings regarding plausibility, and (iii) subjective annoyance ratings to test whether auralisations and recordings differ with respect to noise effects. Further, first results on the comparison of a future concept with a current aircraft are presented.