Abstract

Context. Pre-trained large language models are currently the driving force behind the development of not only NLP, but also deep learning systems in general. Model transformers are able to solve virtually all problems that currently exist, provided that certain requirements and training practices are met. In turn, words, sentences and texts are the basic and most important way of communication between intellectually developed beings. Of course, speech and texts are used to convey certain emotions, events, etc. One of the main ways of using language to describe experienced emotions is songs with lyrics. However, often due to the need to preserve rhyme and rhyming, the dimensions of verse lines, song structure, etc., artists have to use repetition of lines in the lyrics. In addition, the process of writing texts can be long.
 Objective of the study is to develop information technology for generating the continuation of song texts based on the T5 machine learning model with (SA, specific author) and without (NSA, non-specific author) consideration of the author's style.
 Method. Choosing a decoding strategy is important for the generation process. However, instead of favoring a particular strategy, the system will support multiple strategies. In particular, the following 8 strategies: Contrastive search, Top-p sampling, Top-k sampling, Multinomial sampling, Beam search, Diverse beam search, Greedy search, and Beam-search multinomial sampling.
 Results. A machine learning model was developed to generate the continuation of song lyrics using large language models, in particular the T5 model, to accelerate, complement and increase the flexibility of the songwriting process.
 Conclusions. The created model shows excellent results of generating the continuation of song texts on test data. Analysis of the raw data showed that the NSA model has less degrading results, while the SA model needs to balance the amount of text for each author. Several text metrics such as BLEU, RougeL and RougeN are calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most variable, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability, a smaller range of values. For comparison, 8 different decoding methods for text generation, supported by the transformers library, were used. From all the results of the text comparison, it is clear that the metrically best method of song text generation is beam search and its variations, in particular beam sampling. Contrastive search usually outperformed the conventional greedy approach. The top-p and top-k methods are not clearly superior to each other, and in different situations gave different results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.