Abstract

Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native (Spanish)/foreign terms).

Highlights

  • Search-on-speech aims to retrieve speech content from audio repositories that matches user queries

  • Spoken Term Detection (STD) is especially important, since this offers the possibility of retrieving any speech file that contains any term from its textual representation and is able to be used from devices with text input capabilities

  • Some systems are based on phone Automatic Speech Recognition (ASR) to retrieve OOV terms whereas others employ word lattices output by a word-based ASR system to produce OOV term detections

Read more

Summary

Introduction

Search-on-speech aims to retrieve speech content from audio repositories that matches user queries. Within searchon-speech, there are several applications (that can be further divided into two different categories depending on the input/output of the system) shown in Table 1 for which significant research has been conducted. Within these applications, Spoken Term Detection (STD) is especially important, since this offers the possibility of retrieving any speech file that contains any term from its textual representation and is able to be used from devices with text input capabilities. The term detector searches for putative detections of the terms in the word/subword lattices, and the decision maker decides whether each detection is a hit or a false alarm (FA) based on certain confidence measures

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call