Robust Speaker Identification of IoT based on Stacked Sparse Denoising Auto-encoders

Zhifeng Wang,Surong Duan,Helin Wu,Xinguo Yu,Chunyan Zeng,Yang Yang

doi:10.1109/ithings-greencom-cpscom-smartdata-cybermatics50389.2020.00056

Zhifeng Wang, Surong Duan + Show 4 more

https://doi.org/10.1109/ithings-greencom-cpscom-smartdata-cybermatics50389.2020.00056

Copy DOI

Abstract

Robust speaker identification of Internet of Things (IoT) becomes an active research problem in the field of biometrics, and it has wide applications in commercial fields and public security. In order to improve the robustness of speaker identification of IoT, we propose a novel Stacked Sparse Denoising Auto-encoder (SSDAE) for robust speaker identification in this paper. Firstly, the i-vector features are estimated based on the universal background model and total variability space model. For the second stage, sparsity constraints and corrupting operations are fused together into the general auto-encoder, and then the single sparse denoising auto-encoder is constructed. Then, the SSDAE is built with multi-hidden sparse denoising auto-encoder layers by layer wisely stacking operation. In the final layer of the designed deep neural network, the Softmax layer is included for classification purpose. The experimental results have shown that the proposed robust speaker identification method achieves better performance in robust speaker identification task than the state-of-the-art approaches for different kinds and levels of noisy inputs.

Full Text