Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

Martín Haro,Álvaro Corral,Perfecto Herrera,Joan Serrà

doi:10.1371/journal.pone.0033993

Abstract

Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.

Highlights

Heavy-tailed distributions pervade data coming from processes studied in several scientific disciplines such as physics, engineering, computer science, geoscience, biology, economics, linguistics, and social sciences [1,2,3,4,5,6]
Several works have shown heavy-tailed distributions of data extracted from symbolic representations of music such as scores [20,21] and MIDI files [22,23,24] (MIDI is an industry standard protocol to encode musical information; this protocol does not store sound but information about musical notes, durations, volume level, instrument name, etc.)
Symbolic representations are only available for a small portion of the world’s music and, are non-standard and difficult to define for other types of sounds such as human speech, animal vocalizations, and environmental sounds

Summary

Introduction

Heavy-tailed distributions (e.g. power-law or log-normal) pervade data coming from processes studied in several scientific disciplines such as physics, engineering, computer science, geoscience, biology, economics, linguistics, and social sciences [1,2,3,4,5,6] This ubiquitous presence has increasingly attracted research interest over the last decades, specially in trying to find a unifying principle that links and governs such disparate complex systems [5,6,7,8,9,10,11,12,13,14,15,16,17]. Some works can be found describing heavytailed distributions of sound amplitudes from music, speech, and crackling noise [2,26,27]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Mar 29, 2012
Citations: 63	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Development of a Large-Item Environmental Sound Test and the Effects of Short-Term Training with Spectrally-Degraded Stimuli
Valeriy Shafiro
Ear & Hearing | VOL. 29
Valeriy ShafiroValeriy Shafiro
01 Oct 2008
Ear & Hearing | VOL. 29

Machine Learning Enabled Wiener Filters for Attenuating Random Noises in Das Seismic
L Zhang ... M Craven
-
L Zhang, et. al.L Zhang ... M Craven
01 Jan 2020
01 Jan 2020

Let the Children Listen: A First Approximation to the Sound Environment Assessment of Children through a Soundwalk Approach.
Laura Estévez-Mauriz ... Georgios Zachos
International journal of environmental research and public health | VOL. 17
Laura Estévez-Mauriz, et. al.Laura Estévez-Mauriz ... Georgios Zachos
01 Jun 2020
International journal of environmental research and public health | VOL. 17

Daily Sound Awareness of CI Users
Valeriy Shafiro ... Aaron C Moberly
The Hearing journal | VOL. 70
Valeriy Shafiro, et. al.Valeriy Shafiro ... Aaron C Moberly
01 May 2017
The Hearing journal | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one