A visual representation method, DoAgram, for multi-dimensional sound scene analysis is suggested. The visual representation of the sound source localization result gives intuitive information about the estimated one to the end user. Also, image-based deep learning is popularly used in the acoustic field nowadays, so such a visualized one can be used for data augmentation. To analyze the spatial sound scene for the moving source, the method displays the estimated azimuth angle and elevation angle of the source, and its corresponding time stamp and frequencies as RGB color channels and metadata by mapping the spatial coordinate to color space. Even though the suggested method is human-interpretable, decoding is needed for the quantitative analysis. Therefore, the time and frequency scanning method, and a histogram to estimate the DoA of the source are proposed. An experiment is conducted in an anechoic chamber to localize two quadcopter drones that have a mean angular velocity of 8°/s ± 9°/s (95 % CI) and 25°/s ± 31°/s (95 % CI), respectively, and the spatial sound scene analysis is implemented using the proposed methods. The test result shows that the trajectories with respect to the time of each source are well separated. Also, an additional test is conducted using an open-access audio dataset for machine learning. The cumulative source mapping method is adopted for the spatial sound scene analysis, and the decoded result shows that the DoAgram is feasible to adopt for machine learning applications.
Read full abstract