Abstract

This study compares the performance of a neural net approach with conventional linear methods as applied to the problem of feature combination in the domain of speaker identity verification (SIV). The experiment endeavors to combine features consisting of LPC-cepstral coefficient differences and pitch differences for isolated words in a template-matching scenario. The signal features are analyzed for 30-ms frames every 10 ms. The pitch estimate is based on the cepstrum of the LPC residual. Previous work [G. Velius, ICASSP 88, 583–586 (1988)] showed that the Fisher linear discriminant (FLD) was better at feature weighting (for cepstral coefficients only) than several other common linear methods. Results show that, when feature combination is done by the neural net, the SIV task is performed significantly better than when the feature combination (i.e., weighting) is done by the FLD. The neural network architecture used in this experiment was in no way “optimized” for the specific task at hand. An additional finding is that the pitch feature used here, in conjuction with the cepstral coefficients, contributes significantly to the SIV task; that is, the error rate is reduced by 13%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call