Abstract

In audio-visual recordings of music performances, visual cues from instrument players exhibit good temporal correspondence with the audio signals and the music content. These correspondences provide useful information for estimating source associations, i.e., for identifying the affiliation between players and sound sources or score parts. In this paper, we propose a computational system that models audio-visual correspondences to achieve source association for Western chamber music ensembles including strings, woodwind, and brass instruments. Through its three modules, the system models three typical types of correspondences between 1) body motion (e.g., bowing for string instruments and sliding for trombone) and note onsets, 2) finger motion (e.g., fingering for most woodwind and brass instruments) and note onsets, and 3) vibrato hand motion (e.g., fingering hand rolling for string instruments) and pitch fluctuations. Although the three modules are designed for estimating associations for different instruments, the overall system provides a universal framework for all common melodic instruments in Western chamber ensembles. The framework automatically and adaptively integrates the three modules, without requiring prior knowledge of the instrument types. The system operates in an online fashion, i.e., associations are updated as the audio-visual stream progresses. We evaluate the system on ensembles with different instruments and polyphony, ranging from duets to quintets. Results demonstrate that association accuracy increases as the duration of video excerpts increases. For string quintets, the accuracy is over 90% from just a 5-second video excerpt, while for woodwind, brass, and mixed-instrument quintets, a similar accuracy can be reached after processing 30 seconds of video. The result of the proposed framework is promising and enables novel applications such as interactive audio-visual music editing and auto-whirling camera in concerts.

Highlights

  • Visual aspects of music performances are often important

  • Even in prestigious classical music performances, research has shown that body movements and facial expressions of performers exert strong influences on the judgment of

  • Experiments on 17,574 audio-visual clips generated from 44 chamber music pieces in the URMP dataset (Li et al, 2019) that spans a polyphony range from duets to quintets, show that: 1) Different modules are helpful for different instruments, and the system is able to integrate them automatically to achieve a high overall accuracy; 2) Accuracy increases as longer video streams are available, reaching an average accuracy of 90% for 5-second video excerpts of string instruments, and for 30-second excerpts of woodwind and brass instruments

Read more

Summary

Introduction

Visual aspects of music performances are often important. Performers use various kinds of body movements to express their emotions and to impress audiences (Parncutt and McPherson, 2002; Sörgjerd, 2000). Visual interactions among musicians are important for coordination of timing and dynamics. Creative visual performances give artists a substantial competitive advantage. The inclusion of videos in music albums is shown to provide an eight-percent boost, on average, in purchase intent and improved perception (measured by Nielsen Holdings).. Even in prestigious classical music performances, research has shown that body movements and facial expressions of performers exert strong influences on the judgment of performance quality, for expert or novice audiences alike (Tsay, 2014) The inclusion of videos in music albums is shown to provide an eight-percent boost, on average, in purchase intent and improved perception (measured by Nielsen Holdings). Even in prestigious classical music performances, research has shown that body movements and facial expressions of performers exert strong influences on the judgment of performance quality, for expert or novice audiences alike (Tsay, 2014)

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call