Abstract

Automatic harmonic analysis has been an enduring focus of the MIR community, and has enjoyed a particularly vigorous revival of interest in the machine-learning age. We focus here on the specific case of Roman numeral analysis which, by virtue of requiring key/functional information in addition to chords, may be viewed as an acutely challenging use case. We report on three main developments. First, we provide a new meta-corpus bringing together all existing Roman numeral analysis datasets; this offers greater scale and diversity, not only of the music represented, but also of human analytical viewpoints. Second, we examine best practices in the encoding of pitch, time, and harmony for machine learning tasks. The main contribution here is the introduction of full pitch spelling to such a system, an absolute must for the comprehensive study of musical harmony. Third, we devised and tested several neural network architectures and compared their relative accuracy. In the best-performing of these models, convolutional layers gather the local information needed to analyse the chord at a given moment while a recurrent part learns longer-range harmonic progressions. Altogether, our best representation and architecture produce a small but significant improvement on overall accuracy while simultaneously integrating full pitch spelling. This enables the system to retain important information from the musical sources and provide more meaningful predictions for any new input.

Highlights

  • Our best representation and architecture produce a small but significant improvement on overall accuracy while simultaneously integrating full pitch spelling

  • For the ABC corpus, we used the version reported by Tymoczko et al (2019)

  • We propose a third, ‘compromise’ option reflecting the special role of the bass in tonal harmony in defining both chordal inversion and other important matters for harmonic progression

Read more

Summary

Roman Text Bach

The Bach example begins to show that more complex contexts can run these rules into self-contradiction. We propose a third, ‘compromise’ option reflecting the special role of the bass in tonal harmony in defining both chordal inversion and other important matters for harmonic progression In this case, music is encoded with two vectors per frame: one with the lowest note and another with the total pitch content. Our second constraint limits the keys to a narrower range from C♭ to C majors and their relative minors (A♭ to A ) such that the diatonic pitches are limited to single flats/ sharps We do this to reduce the computational load without losing actual information, as real pieces very rarely go outside these key boundaries. One needs to strike a balance input short-range

Fully connected
Results
ConvGRU ConvDil PoolGRU bass full class spelling chromatic global local
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call