Real-time impulse response: a methodology based on Machine Learning approaches for a rapid impulse response generation for real-time Acoustic Virtual Reality systems

D.A Sanaguano-Moreno,J.F Lucio-Naranjo,R.A Tenenbaum,G.B Sampaio-Regattieri

doi:10.1016/j.iswa.2023.200306

Abstract

Simulation of high-definition binaural room impulse responses using conventional approaches involves a significant amount of computational resources, resulting in high computational time, making these approaches incapable of performing real-time high quality acoustic virtual reality. This research implemented a methodology for the rapid impulse response generation using the position of a moving listener inside a fixed sound field. The rapid generation of the impulse response is performed using its representative compressed dimension, with a smaller dimension than the original impulse response, learned by variational autoencoders and long short-term memory neural networks. First, the methodology selects a representative number of impulse responses covering the area of interest using a reliable room acoustic simulator. Second, it generates a dataset with sufficient impulse responses uniformly distributed through a data augmentation approach using a modified bilinear interpolation from the impulse responses previously simulated. Third, it applies an unsupervised model to positionally cluster the impulse responses to reduce the variability of the impulse responses in the given environment. Fourth, it splits the impulse response into time segments and generates a dataset per segment and cluster. Fifth, it trains a variational autocoder with a long short-term memory neural network model for each time segment cluster of impulse responses to infer the correspondent compressed impulse response part. In summary, the impulse response is generated by assigning the current listener position to the corresponding cluster and executing the decoders of the variational autoencoders with long short-term memory, trained previously. The findings are encouraging; the normalized mean absolute error of the impulse responses gathered by the interpolator and the impulse responses generated by the proposed model is less than 15% in the 88% of impulse responses reserved for testing. Moreover, the average of the absolute error of the interaural cross-correlation coefficient between the impulse responses obtained from the simulator and the proposed model is around 16%, implying that most acoustic characteristics of the real impulse are preserved. In addition, the computational time for generating the impulse response segment of 300 ms is approximately 65 ms, which is almost haft than the total system latency for a realistic auralization, 112 ms.

Full Text