Increased cost-effective computational resources in the cloud in combination with state-of-the-art simulation algorithms has opened doors for generating large quantities of simulated high-fidelity room impulse responses. Such datasets can, e.g., be used for training machine learning based audio signal processing algorithms such as speech enhancement or dereverberation, as well as for carrying out virtual prototyping of audio devices in different acoustic conditions. The authors have previously presented a cloud-based simulation engine that is well suited for such large-scale dataset generation tasks. The simulation engine leverages a hybrid wave/geometrical solver, in combination with various geometry pre-processing technology and spatial audio post-processing technology. In this paper, we present a dataset generation case study using this simulation engine, as well as a novel validation study into certain aspects of the spatial audio post-processing. Finally, we present an analysis into computational performance when leveraging multi-node computational architectures to run extremely large wave-based simulations using the simulation framework. To the best of our knowledge, this is the first time a multi-node wave-based acoustics solver presented, enabling wave-based simulations with billions of degrees of freedom.