Segmentation of continuous audio recordings of Carnatic music concerts into items for archival

Sarala Padi,Hema A Murthy

doi:10.1007/s12046-018-0922-y

Abstract

Concert recordings of Carnatic music are often continuous and unsegmented. At present, these recordings are manually segmented into items for making CDs. The objective of this paper is to develop algorithms that segment continuous concert recordings into items using applause as a cue. Owing to the ‘here and now’ nature of applauses, the number of applauses exceeds the number of items in the concert. This results in a concert being fragmented into different segments. In the first part of the paper, applause locations are identified using time, and spectral domain features, namely, short-time energy, zero-crossing rate, spectral flux and spectral entropy. In the second part, inter-applause segments are merged if they belong to the same item. The main component of every item in a concert is a composition. A composition is characterised by an ensemble of vocal (or main instrument), violin (optional) and percussion. Inter-applause segments are classified into three segments, namely, vocal solo, violin solo, composition and thaniavarthanam using tonic normalised cent filter-bank cepstral coefficients. Adjacent composition segments are merged into a single item, if they belong to the same melody. Meta-data corresponding to the concert in terms of items, available from listeners, are matched to the segmented audio. The applauses are further classified based on strength using Cumulative Sum. The location of the top three highlights of every concert is documented. The performance of the proposed approaches to applause identification, inter-applause classification and mapping of items is evaluated on 50 live recordings of Carnatic music concerts. The applause identification accuracy is 99%, and the inter- and intra-item classification is 93%, while the mapping accuracy is 95%.

Full Text