Spoken Content Retrieval Research Articles

Query-by-Example spoken content retrieval is a demanding and challenging task when a large volume of spoken content is piled up in the repositories without annotation. In the absence of annotation, spoken content retrieval is achieved by capturing the similarities between the query and spoken terms from the acoustic feature representation itself. Dynamic Time Warping (DTW) centric techniques identify the optimal alignment between the acoustic feature representations and capture the similarities between query and spoken terms. Despite feasibility, the DTW-centric techniques produce a lot of false alarms due to the variabilities that exist in natural speech and degrade the performance. In the proposed approach, the variability challenges are addressed in two stages. At first, the speaker-independent acoustic feature representation was obtained from the deep convolutional neural networks that reduce the speaker variabilities. In the second stage, the similarities between the query and spoken term were captured using the heuristic search method. The proposed approach was compared with other state-of-the-art methods using Microsoft Low-Resource Language speech corpus. A 3% improvement and 32% reduction in the hit and false alarm ratio were achieved across languages.

Read full abstract

After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken content retrieval, including information retrieval, speech recognition and multimedia analysis. Multimedia collections often contain a speech track, but in many cases it is ignored or not fully exploited for information retrieval. Currently, spoken content retrieval research is expanding beyond highly-conventionalized domains such as broadcast news in to domains involving speech that is produced spontaneously and in conversational settings. Such speech is characterized by wide variability of speaking styles, subject matter and recording conditions. The work presented at SSCS 2009 included techniques for searching meetings, interviews, telephone conversations, podcasts and spoken annotations. The work encompassed a large range of approaches including using subword units, exploiting dialogue structure, fusing retrieval models, modeling topics and integrating visual features. Taken in sum, the workshop demonstrated the high potential of new ideas emerging in the area of speech search and also reinforced the need for concentrated research devoted to the classic challenges of spoken content retrieval, many of which remain yet unsolved.

Read full abstract

Spoken Content Retrieval Research Articles

Related Topics

Articles published on Spoken Content Retrieval

Query-by-Example Spoken Term Detection for Zero-Resource Languages Using Heuristic Search

Report on the 1st Workshop on Audio Collection Human Interaction (AudioCHI 2022) at CHIIR 2022

Spoken content retrieval and understanding using deep learning

Multimedia with a speech track

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Spoken Content Retrieval Research Articles

Related Topics

Articles published on Spoken Content Retrieval

Query-by-Example Spoken Term Detection for Zero-Resource Languages Using Heuristic Search

Report on the 1st Workshop on Audio Collection Human Interaction (AudioCHI 2022) at CHIIR 2022

Spoken content retrieval and understanding using deep learning

Multimedia with a speech track