Query Frame Research Articles

In this paper, we present our audio fingerprinting system that detects a transformed copy of an audio from a large collection of audios in a database. The audio fingerprints in this system encode the positions of salient regions of binary images derived from a spectrogram matrix. The similarity between two fingerprints is defined as the intersection of their elements (i.e. positions of the salient regions). The search algorithm labels each reference fingerprint in the database with the closest query frame and then counts the number of matching frames when the query is overlaid over the reference. The best match is based on this count. The salient regions fingerprints together with this nearest-neighbor search give excellent copy detection results. However, for a large database, this search is time consuming. To reduce the search time, we accelerate this similarity search by using a graphics processing unit (GPU). To speed this search even further, we use a two-step search based on a clustering technique and a lookup table that reduces the number of comparisons between the query and the reference fingerprints. We also explore the tradeoff between the speed of search and the copy detection performance. The resulting system achieves excellent results on TRECVID 2009 and 2010 datasets and outperforms several state-of-the-art audio copy detection systems in detection performance, localization accuracy and run time. For a fast detection scenario with detection speed comparable to the Ellis' Shazam-based system, our system achieved the same min NDCR as the NN-based system, and significantly better detection accuracy than Ellis' Shazam-based system.

Read full abstract

We present a set of model-based approaches for unsupervised spoken term detection (STD) with spoken queries that requires neither speech recognition nor annotated data. This work shows the possibilities in migrating from DTW-based to model-based approaches for unsupervised STD. The proposed approach consists of three components: self-organizing models, query matching, and query modeling. To construct the self-organizing models, repeated patterns are captured and modeled using acoustic segment models (ASMs). In the query matching phase, a document state matching (DSM) approach is proposed to represent documents as ASM sequences, which are matched to the query frames. In this way, not only do the ASMs better model the signal distributions and time trajectories of speech, but the much-smaller number of states than frames for the documents leads to a much lower computational load. A novel duration-constrained Viterbi (DC-Vite) algorithm is further proposed for the above matching process to handle the speaking rate distortion problem. In the query modeling phase, a pseudo likelihood ratio (PLR) approach is proposed in the pseudo relevance feedback (PRF) framework. A likelihood ratio evaluated with query/anti-query HMMs trained with pseudo relevant/irrelevant examples is used to verify the detected spoken term hypotheses. The proposed framework demonstrates the usefulness of ASMs for STD in zero-resource settings and the potential of an instantly responding STD system using ASM indexing. The best performance is achieved by integrating DTW-based approaches into the rescoring steps in the proposed framework. Experimental results show an absolute 14.2% of mean average precision improvement with 77% CPU time reduction compared with the segmental DTW approach on a Mandarin broadcast news corpus. Consistent improvements were found on TIMIT and MediaEval 2011 Spoken Web Search corpus.

Read full abstract

Query Frame Research Articles

Related Topics

Articles published on Query Frame

Fast Audio Fingerprinting System Using GPU and a Clustering-Based Technique

A comparative study of video-based object recognition from an egocentric viewpoint

3D Hand Gesture Analysis through a Real-Time Gesture Search Engine

Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding.

Efficient Subframe Video Alignment Using Short Descriptors

Model-Based Unsupervised Spoken Term Detection with Spoken Queries

CRIM’s content-based audio copy detection system for TRECVID 2009

ESTminer: a Web interface for mining EST contig and cluster databases.

Robust color histogram descriptors for video segment retrieval and identification

Spatial Similarity Retrieval in Video Databases

Characteristics of multidimensional holographic associative memory in retrieval with dynamically localizable attention

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Query Frame Research Articles

Related Topics

Articles published on Query Frame

Fast Audio Fingerprinting System Using GPU and a Clustering-Based Technique

A comparative study of video-based object recognition from an egocentric viewpoint

3D Hand Gesture Analysis through a Real-Time Gesture Search Engine

Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding.

Efficient Subframe Video Alignment Using Short Descriptors

Model-Based Unsupervised Spoken Term Detection with Spoken Queries

CRIM’s content-based audio copy detection system for TRECVID 2009

ESTminer: a Web interface for mining EST contig and cluster databases.

Robust color histogram descriptors for video segment retrieval and identification

Spatial Similarity Retrieval in Video Databases

Characteristics of multidimensional holographic associative memory in retrieval with dynamically localizable attention