Abstract

In the i-vector model, the utterance statistics are extracted from features using universal background model. The utterance is mapped to a vector in the total variability space, which is called i-vector. The total variability space provides a basis to obtain a low dimensional fixed-length representation of a speech utterance. But, the processing is complicated for the interweaving of the statistics and machine learning method. So, we considered separating them and proposed a simple way to extract i-vector by classical principal component analysis, factor analysis and independent component analysis from normalized statistics. The results on NIST 2008 telephone data show that the performance is very close to the traditional method and they can be improved obviously after score fusion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call