Controlling Melody Structures in Automatic Game Soundtrack Compositions With Adversarial Learning Guided Gaussian Mixture Models

Zhiyang Xiang,Yibo Guo

doi:10.1109/tg.2020.3035593

Abstract

The vastness of gaming plots and variety of environments in computer games require a large amount of labors in soundtrack compositions. Since human composers are expensive, artificial intelligence composing techniques have been proposed in several open-source projects. Current technologies have good performances at improvisations in short melody compositions, but face great challenges in industrial level automatic compositions of highly structured tracks for games. In this article, the overall structure specifying transitions and repetitions of melodies is given by human, and detailed contents like notes and rhythms are completed with a Gaussian mixture model (GMM) and generative adversarial nets (GAN). Different from recurrent neural networks, which are the mainstream automated melody generators, the GMM can be controlled to form structures because its latent space is often similar to the data space. A layered framework is devised where the basic layer composes melodies and high-level layers organize melodies according to long-term structures. In each layer, a Gaussian mixture generative model with constraints is constructed to compose candidate tracks, whereas another GMM network is trained in competition with the generator, such that optimal tracks from the generator are identified. Experiments show that the proposed framework has a high rate of composing acceptable soundtracks. Entropy curves calculated show that the composed tracks are more similar to game soundtracks than existing methods. In a user study, 11 out of 16 human criticizers favor the proposed compositions over the original GAN and recurrent neural networks.

Full Text