Abstract
Mondegreen is a common phenomenon during conversation and is more obvious during listening to songs. To simulate the impression that the audience will have when they hear a particular piece, the Soramiminet model is introduced in this paper. The model is a combination of wave2vec, and transformer used for generating misheard lyrics in Chinese with various input songs. This article will focus more on the establishment of the dataset for wave2vec model. The criteria and several methods of creating a high-quality dataset are summarized and present in this paper. Additionally, the negative impact of a defective dataset and how to avoid it is discussed. The main limitations and biases of this dataset, and how to address them and future work are explored. The dataset includes a set of audio clips from 21 tracks by 9 different singers in AVI form and a text file for annotation which can be processed by python dataset module. The dataset is relatively biased, since all the annotations are done by the author personally. There is no correct labelling, given that the model is generating misheard or wrong results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.