Abstract

We present an innovative tempo estimation system that processes acoustic audio signals and does not use any high-level musical knowledge. Our proposal relies on a harmonic + noise decomposition of the audio signal by means of a subspace analysis method. Then, a technique to measure the degree of musical accentuation as a function of time is developed and separately applied to the harmonic and noise parts of the input signal. This is followed by a periodicity estimation block that calculates the salience of musical accents for a large number of potential periods. Next, a multipath dynamic programming searches among all the potential periodicities for the most consistent prospects through time, and finally the most energetic candidate is selected as tempo. Our proposal is validated using a manually annotated test-base containing 961 music signals from various musical genres. In addition, the performance of the algorithm under different configurations is compared. The robustness of the algorithm when processing signals of degraded quality is also measured.

Highlights

  • The continuously growing size of digital audio information increases the difficulty of its access and management, hampering its practical usefulness

  • If there exists no agreement between methods, preference was given to the spectral sum (SS)

  • It must be remarked that the combination of the system components is rather crude and this may explain that only a small improvement in performance is obtained

Read more

Summary

Introduction

The continuously growing size of digital audio information increases the difficulty of its access and management, hampering its practical usefulness. One of the subjects that has attracted much attention in this field concerns the extraction of rhythmic information from music. It is difficult to provide a rigorous universal definition, but for our needs we can quote Parncutt [1]: “a musical rhythm is an acoustic sequence evoking a sensation of pulse” which refers to all possible rhythmic levels, that is, pulse rates, evoked in the mind of a listener (see Figure 1). The concept of phenomenal accent has a great relevance in this context, Lerdahl and Jackendoff [3] define it as “the moments of musical stress in the raw signal (who) serve as cues from which the listener attempts to extrapolate a regular pattern.”. The concept of phenomenal accent has a great relevance in this context, Lerdahl and Jackendoff [3] define it as “the moments of musical stress in the raw signal (who) serve as cues from which the listener attempts to extrapolate a regular pattern.” In practice, we consider as phenomenal accents all the discrete events in the audio stream where there is a marked change in any of the perceived psychoacoustical properties of sound, that is, loudness, timbre, and pitch

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.