Abstract

We systematically analyze audio key finding to determine factors important to system design, and the selection and evaluation of solutions. First, we present a basic system, fuzzy analysis spiral array center of effect generator algorithm, with three key determination policies: nearest-neighbor (NN), relative distance (RD), and average distance (AD). AD achieved a 79% accuracy rate in an evaluation on 410 classical pieces, more than 8% higher RD and NN. We show why audio key finding sometimes outperforms symbolic key finding. We next propose three extensions to the basic key finding system—the modified spiral array (mSA), fundamental frequency identification (F0), and post-weight balancing (PWB)—to improve performance, with evaluations using Chopin's Preludes (Romantic repertoire was the most challenging). F0 provided the greatest improvement in the first 8 seconds, while mSA gave the best performance after 8 seconds. Case studies examine when all systems were correct, or all incorrect.

Highlights

  • Our goal in this paper is to present a systematic analysis of audio key finding in order to determine the factors important to system design, and to explore the strategies for selecting and evaluating solutions

  • The modified spiral array model is built with the frequency features of audio, the fundamental frequency identification scheme emphasizes the bass line of the piece, and the post-weight balancing uses the knowledge of music theory to adjust the pitch-class distribution

  • We have presented a fundamental audio key-finding system, FACEG, with three key determination policies of (NN), (RD), and (AD)

Read more

Summary

Introduction

Our goal in this paper is to present a systematic analysis of audio key finding in order to determine the factors important to system design, and to explore the strategies for selecting and evaluating solutions. In this paper we present a basic audio key-finding system, the fuzzy analysis technique with the spiral array center of effect generator (CEG) algorithm [1, 2], known as FACEG, first proposed in [3]. Based on the evaluation of the basic system (FACEG), we provide three extensions at different stages of the system, the modified spiral array (mSA) model, fundamental frequency identification (F0), and post-weight balancing (PWB). The modified spiral array model is built with the frequency features of audio, the fundamental frequency identification scheme emphasizes the bass line of the piece, and the post-weight balancing uses the knowledge of music theory to adjust the pitch-class distribution. The alternative systems are evaluated statistically, using average results on large datasets, and through case studies of score-based analyses

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call