Abstract
Speaker recognition is an emerging task in both commercial and forensic applications. Nevertheless, while in certain applications we can estimate, adapt or hypothesize about our working conditions, most of the commercial applications and almost the whole of the forensic approaches to speaker recognition are still open problems, due to several reasons. Some of these reasons can be stated: environmental conditions are (usually) rapidly changing or highly degraded, acquisition processes are not always under control, incriminated people exhibit low degree of cooperativeness, etc., inducing a wide range of variability sources on speech utterances. In this sense, real approaches to speaker identification necessarily imply taking into account all these variability factors. In order to isolate, analyze and measure the effect of some of the main variability sources that can be found in real commercial and forensic applications, and their influence in automatic recognition systems, a specific large speech database in Castilian Spanish called AHUMADA (/aumáda/) has been designed and acquired under controlled conditions. In this paper, together with a detailed description of the database, some experimental results including different speech variability factors are also presented.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.