ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Emilio Sanchis,Fernando Perdigão,Alberto Abad,Doroteo T Toledano,Javier Tejedor,Fernando García-Granada,Laura Docio-Fernandez,Paula Lopez-Otero,Anna Pompili,Jorge Proença

doi:10.1186/s13636-018-0125-9

Abstract

Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems submitted to the ALBAYZIN QbE STD 2016 Evaluation held as a part of the ALBAYZIN 2016 Evaluation Campaign at the IberSPEECH 2016 conference. Special attention was given to the evaluation design so that a thorough post-analysis of the main results could be carried out. Two different Spanish speech databases, which cover different acoustic and language domains, were used in the evaluation: the MAVIR database, which consists of a set of talks from workshops, and the EPIC database, which consists of a set of European Parliament sessions in Spanish. We present the evaluation design, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a thorough analysis and discussion. Four different research groups participated in the evaluation, and a total of eight template matching-based systems were submitted. We compare the systems submitted to the evaluation and make an in-depth analysis based on some properties of the spoken queries, such as query length, single-word/multi-word queries, and in-language/out-of-language queries.

Highlights

The huge amount of heterogeneous speech data stored in audio and audiovisual repositories makes it necessary to develop efficient methods for speech information retrieval
3.2.4 Fusion The output detections from the Brazilian Portuguese, Spanish, and European Portuguese AUDIMUS phoneme recognizers, and the Czech Brno University of Technology (BUT) phoneme recognizer [76], are fused with the strategy presented in the three feature+dynamic time warping (DTW)-based fusion query-by-example spoken term detection (QbE spoken term detection (STD)) system
The results suggest that using the target language is not that beneficial when the acoustic domain of the development and the test data changes, since the performance of the language-independent QbE STD systems, i.e., H-SPL-IT-UC-2-LIphnrec+DTW fusion, is better than that of some language-dependent QbE STD systems, i.e., B-L2F-4-pllr fea+DTW fusion

Summary

Introduction

The huge amount of heterogeneous speech data stored in audio and audiovisual repositories makes it necessary to develop efficient methods for speech information retrieval. These campaigns are internationally open sets of evaluations supported by the Spanish Network of Speech Technologies (RTTH) and the ISCA Special Interest Group on Iberian Languages (SIG-IL), which have been held every 2 years since 2006 These evaluation campaigns provide an objective mechanism for the comparison of different systems and the promotion of research into different speech technologies such as audio segmentation [51], speaker diarization [52], language recognition [53], spoken term detection [54], query-by-example spoken term detection [55, 56], and speech synthesis [57]. The results and discussion are presented, and the paper is concluded in the final section

ALBAYZIN QbE STD 2016 evaluation

Evaluation

Results and discussion

Test data results

Comparison with the ALBAYZIN QbE STD 2014 evaluation

Conclusions