Abstract

The purpose of this paper is to build a model and design a speaker recognition system by comprehensively summarizing and learning the research data of speaker speech recognition models at home and abroad, and adopting a research method based on deep machine learning theory. Its main contents and proposed methods are as follows: For data processing, firstly, select and download the public data set from official website, preprocess each voice in the data set, extract Fbank features, convert it into. npy, store it in a file, process the voice into a format suitable for model input, and wait for subsequent input into the model. In practice, a ResCNN architecture based on convolution neural network is used to build a model. The model uses triplet loss function training to map speech to hyperplane, so cosine similarity is directly used to characterize the distance between two speakers. Speaker verification function provides three different ways to obtain speech, input the two acquired speech into the model, judge the similarity of the two speech and give the judgment result. For the speaker recognition model, three different ways can also be used to obtain the speech and determine which speaker the speech is in the corpus. For the speaker confirmation model, a speech is randomly played, and a speaker is randomly selected to judge whether the speech is the speaker's voice.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.