One task of nonparallel speech conversion is to convert the source speaker’s speech samples to the target speaker’s speech samples, keeping the content unchanged. In view of the advantages of MaskCycleGAN-VC in nonparallel speech conversion, such as small model size and superior performance, our paper uses the basic structure of MaskCycleGAN-VC to improve it and proposes a cyclic boundary method filling in the frame MaskCycleGAN-VC (CBFMCycleGAN-VC) model, which predicts the voice of a person as he ages by using voice samples of his younger self. First, this paper adds speech preprocessing modules, including the Chebyshev low-pass filter and adaptive filter, which increases the robustness of the system. Second, our paper considers the time-domain difference in the weight parameters, which makes it easier to grasp the mapping law of the time-domain structure, with a faster convergence speed. Last, the circular boundary method is introduced to avoid the ringing effect, to enhance the connection between the filled frame and the adjacent frame, and to obtain a better generator. The simulation results show that the CBFMCycleGAN-VC model is more suitable for the speech conversion task of predicting the voices of elderly people, and the convergence speed is faster. The converted voice is also closer to the voice of the target speaker in the time domain and frequency domain. Under the condition that the accuracy rate is similar to that of MaskCycleGAN-VC, the MOS score is 17.5% higher than that of MaskCycleGAN-VC.
Read full abstract