MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription

Xianke Wang,Bowen Tian,Wenqing Cheng,Wei Xu,Weiming Yang

doi:10.1109/taslp.2022.3221005

Abstract

Automatic singing transcription (AST), which refers to the process of inferring the onset, offset, and pitch from the singing audio, is of great significance in music information retrieval. Most AST models use the convolutional neural network to extract spectral features and predict the onset and offset moments separately. The frame-level probabilities are inferred first, and then the note-level transcription results are obtained through post-processing. In this paper, a new AST framework called MusicYOLO is proposed, which obtains the note-level transcription results directly. The onset/offset detection is based on the object detection model YOLOX, and the pitch labeling is completed by a spectrogram peak search. Compared with previous methods, the MusicYOLO detects note objects rather than isolated onset/offset moments, thus greatly enhancing the transcription performance. On the sight-singing vocal dataset (SSVD) established in this paper, the MusicYOLO achieves an 84.60% transcription F1-score, which is the state-of-the-art method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Music Information Technology and Professional Stakeholder Audiences: Mind the Adoption Gap
...
-
, et. al. ...
01 Jan 2012
01 Jan 2012

Musical Works and Information Retrieval
Richard P Smiraglia
Notes | VOL. 58
Richard P SmiragliaRichard P Smiraglia
01 Jun 2002
Notes | VOL. 58

Music Information Retrieval using Deep Learning Techniques
Vignesh Subramanian
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08
Vignesh SubramanianVignesh Subramanian
12 May 2024
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08

Evolution and Emerging Trends in Musical Information Retrieval: A Comprehensive Review and Future Prospects
Yuxin Ding
Highlights in Science, Engineering and Technology | VOL. 85
Yuxin DingYuxin Ding
13 Mar 2024
Highlights in Science, Engineering and Technology | VOL. 85

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing