Abstract
The total variability model (TVM) has been extensively used as a tool to obtain a vector representation of the sources of variability present in a signal. However, recent studies have shown that embeddings derived from a deep neural network (DNN) architecture can provide significant performance improvement over TVM for the speaker verification task. In this letter, we show that TVM can also be reformulated in a manner that enables the integration of a DNN within the model. In addition, we show that this TVM architecture can also be incorporated as one of the layers within a DNN embedding system. Through experiments on speakers in the wild (SITW) corpus, we show that the inclusion of total variability layer in a DNN embedding system provides around 20% relative improvement in equal error rate performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.