Abstract

Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conferencea. The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshopsb, which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.

Highlights

  • The ever-increasing volume of heterogeneous speech data stored in audio and audiovisual repositories promotes the development of efficient methods for retrieving the stored information

  • The results of the Query-by-Example Spoken Term Detection (QbE spoken term detection (STD)) evaluation are presented for every system submitted by the participants along with the system applied on STD in terms of maximum term weighted value (MTWV) and actual term weighted value (ATWV) in Tables 6 and 7 for training/development and test data, respectively

  • By analyzing the systems submitted for QbE STD evaluation at due time on test data, system 1 achieved the best performance both in terms of MTWV and ATWV

Read more

Summary

Introduction

The ever-increasing volume of heterogeneous speech data stored in audio and audiovisual repositories promotes the development of efficient methods for retrieving the stored information. Spoken term detection aims at finding individual words or sequences of words within audio archives It relies on a text-based input, commonly the phone. In QbE STD, we consider the scenario where the user has found some interesting data within a speech data repository (for example, by random browsing or some other method). The user selects one or several speech cuts containing the term of interest (query) and the system outputs him/her other putative hits from the repository (utterances). Another scenario for QbE STD considers one or several user speech recordings of the term of interest. QbE STD can be employed for building language-independent STD systems [7,8], which is mandatory when no or very limited training data are available to build a reliable speech recognition system, since a priori knowledge of the language involved in the speech data is not necessary

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.