Probabilistic Segmentation of Folk Music Recordings

Ciril Bohak,Matija Marolt

doi:10.1155/2016/8297987

Abstract

The paper presents a novel method for automatic segmentation of folk music field recordings. The method is based on a distance measure that uses dynamic time warping to cope with tempo variations and a dynamic programming approach to handle pitch drifting for finding similarities and estimating the length of repeating segment. A probabilistic framework based on HMM is used to find segment boundaries, searching for optimal match between the expected segment length, between-segment similarities, and likely locations of segment beginnings. Evaluation of several current state-of-the-art approaches for segmentation of commercial music is presented and their weaknesses when dealing with folk music are exposed, such as intolerance to pitch drift and variable tempo. The proposed method is evaluated and its performance analyzed on a collection of 206 folk songs of different ensemble types: solo, two- and three-voiced, choir, instrumental, and instrumental with singing. It outperforms current commercial music segmentation methods for noninstrumental music and is on a par with the best for instrumental recordings. The method is also comparable to a more specialized method for segmentation of solo singing folk music recordings.

Highlights

Structure is an inherent part of most music we listen to
Based on shortcomings of existing methods, we decided to consider the following folk music specifics when designing our segmentation method: (1) tolerance to tempo deviations in calculation of between-segment similarities, (2) tolerance to pitch drifting in calculation of between-segment similarities, (3) tolerance to noise and performer errors that may occur at different locations in a song, (4) songs that are structured as repetitions of one melodic or harmonic pattern, and (5) focus on segmentation of noninstrumental music, which represents a greater challenge for current methods than instrumental recordings
Their results are slightly better (F1 measure of 0.872 for Solo Onder de Groene Linde (OGL)); we should note that the method is based on F0-enhanced Chroma Energy Normalized Statistics (CENS) features, tuned for solo singing, so we cannot estimate how it would perform for other ensemble types

Summary

Introduction

Structure is an inherent part of most music we listen to. It is what we recognize as repeating patterns of different musical modalities such as beat, rhythm, melody, harmony, or lyrics. The method was applied to audio recordings for detection of repeating parts It mimics short-term memory by encapsulating the most recent parts of a signal, assesses homogeneities and repetitions by pairwise comparison, and computes structure features and differences of these features, which yield a novelty measure, whose peaks indicate boundary estimates. The same authors presented a different approach [17] to music segmentation that relies on an ordinal linear discriminant analysis method for learning feature projections to improve timeseries clustering They propose latent structural repetition features, which provide a fixed-dimensional representation of global song structure and facilitate modeling across multiple songs. We present a novel method for segmentation of folk music field recordings, which include individual songs.

Evaluating the State of the Art

The Proposed Method

Evaluation

Conclusions