Music generation and human voice conversion based on LSTM

Guangwei Li,Yujie Li,Kangkang Zhang,Shuxue Ding,I Barukčić

doi:10.1051/matecconf/202133606015

Guangwei Li, Yujie Li + Show 3 more

Open Access

https://doi.org/10.1051/matecconf/202133606015

Copy DOI

Abstract

Music is closely related to human life and is an important way for people to express their feelings in life. Deep neural networks have played a significant role in the field of music processing. There are many different neural network models to implement deep learning for audio processing. For general neural networks, there are problems such as complex operation and slow computing speed. In this paper, we introduce Long Short-Term Memory (LSTM), which is a circulating neural network, to realize end-to-end training. The network structure is simple and can generate better audio sequences after the training model. After music generation, human voice conversion is important for music understanding and inserting lyrics to pure music. We propose the audio segmentation technology for segmenting the fixed length of the human voice. Different notes are classified through piano music without considering the scale and are correlated with the different human voices we get. Finally, through the transformation, we can express the generated piano music through the output of the human voice. Experimental results demonstrate that the proposed scheme can successfully obtain a human voice from pure piano Music generated by LSTM.

Highlights

Music, as an art form expressing emotions, is in high demand in the market
Various related models have been applied to the study of the music generation problem
Long Short-Term Memory (LSTM) adds the idea of self-circulation to keep the gradient flowing, which can effectively solve the problems of long-term dependence and gradient explosion[7].Through the use of the LSTM music generation model to train a large number of piano music, automatic generation of new piano music

Summary

Introduction

As an art form expressing emotions, is in high demand in the market. At present, the number of professional music creators is limited, and music production takes time and effort, and the cost is high[1]. The development of deep learning in music allows us to use models to generate piano music. With the development of deep learning, the music generation problem has come back into our field of vision. LSTM adds the idea of self-circulation to keep the gradient flowing, which can effectively solve the problems of long-term dependence and gradient explosion[7].Through the use of the LSTM music generation model to train a large number of piano music, automatic generation of new piano music. This paper mainly uses a mature model of piano music generation—LSTM and converse piano music to the human voice[8][9]. Because our experiment is currently only able to perform voice conversion for single-key piano repertoire. The transformation between piano music and our voice information lays a foundation for the step to realize the automatic generation of music with complete lyrics and emotions

Music expression

Existing methods for music generation

Model and formulation

Human voice section

Experiments

Training

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Music generation and human voice conversion based on LSTM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

Mitigation of SOA-Induced Nonlinearity With the Aid of Deep Learning Neural Networks
Kaihui Wang ... Jianjun Yu
Journal of lightwave technology : a joint IEEE/OSA publication | VOL. 40
Kaihui Wang, et. al.Kaihui Wang ... Jianjun Yu
15 Feb 2022
Journal of lightwave technology : a joint IEEE/OSA publication | VOL. 40

Advancing building energy efficiency: A deep learning approach to early-stage prediction of residential electric consumption
Karthic Sundaram ... Ahmed Abdi Yusuf Ali
Energy Reports | VOL. 12
Karthic Sundaram, et. al.Karthic Sundaram ... Ahmed Abdi Yusuf Ali
24 Jul 2024
Energy Reports | VOL. 12

Early Diagnosis and Prediction of Sepsis Shock by Combining Static and Dynamic Information Using Convolutional-LSTM
Chen Lin ... Ryan Arnold
-
Chen Lin, et. al.Chen Lin ... Ryan Arnold
01 Jun 2018
01 Jun 2018

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
21 Nov 2018
21 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Music generation and human voice conversion based on LSTM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences