Late fusion for acoustic scene classification using swarm intelligence

Biyun Ding,Tao Zhang,Ganjun Liu,Lingguo Kong,Yanzhang Geng

doi:10.1016/j.apacoust.2022.108698

Abstract

Acoustic scene classification (ASC) has gained significant interest in recent years due to its diverse applications. However, the performance of ASC is much lower than other audio processing areas, such as speech recognition and music classification. Various audio signal processing and machine learning methods have been proposed for ASC systems with good performance. In this sense, the performance can be significantly improved by taking advantage of these methods together. Late fusion is a commonly used approach to obtain the final decision for a test instance, which fuses the prediction results of the different models. However, it is ubiquitous that different models dispute the prediction on the same data, leading to performance degeneration. This study presents an efficient and effective approach to fuse predictions from multiple sources based on a swarm intelligence algorithm. In this approach, the late fusion procedure is defined as a global optimization problem and the swarm intelligence algorithm is introduced to search an optimal system subset obtaining the best classification performance after late fusion for ASC. The Swarm Intelligence algorithm based Late Fusion (SILF) can avoid the performance degeneration caused by the controversy of multiple sources and optimize the source combination for fusion. The experiments demonstrate the efficacy of SILF for ASC and the performance improvement, which outperforms the state-of-the-art late fusion algorithms on the TAU Urban Acoustic Scenes 2019 development dataset.

Full Text