Abstract

This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matrices and obtains the scores by computing the log-likelihood ratio for the class model to a non-class model over fixed-length windows. Afterwards, these scores are smoothed to yield longer contiguous segments of the same class by means of different back-end systems. Unlike previous solutions, our proposal does not make use of specific acoustic features and does not need a hierarchical structure. The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. The technique is compared to a hierarchical system with specific acoustic features achieving a significant error reduction.

Highlights

  • Due to the increase in audiovisual content, it becomes necessary to use automatic tools for different tasks such as analysis, indexation, search, and information retrieval

  • The boundaries between segments are given by the ground truth and the system decides the class of each segment with no segmentation error to evaluate the classification accuracy of the classical factor analysis (FA) system versus Gaussian mixture model (GMM)

  • The proposed system is based on a factor analysis (FA) approach to compensate the within-class variability with one factor loading matrix per class

Read more

Summary

Introduction

Due to the increase in audiovisual content, it becomes necessary to use automatic tools for different tasks such as analysis, indexation, search, and information retrieval.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call