Generative Adversarial Nets (GANs) are a kind of transformative deep learning framework that has been frequently applied to a large variety of applications related to the processing of images, video, speech, and text. However, GANs still suffer from drawbacks such as mode collapse and training instability. To address these challenges, this paper proposes an Auto-Encoding GAN, which is composed of a set of generators, a discriminator, an encoder, and a decoder. The set of generators is responsible for learning diverse modes, and the discriminator is used to distinguish between real samples and generated ones. The encoder maps generated and real samples to the embedding space to encode distinguishable features, and the decoder determines from which generator the generated samples come and from which mode the real samples come. They are jointly optimized in training to enhance the feature representation. Moreover, a clustering algorithm is employed to perceive the distribution of real and generated samples, and an algorithm for cluster center matching is accordingly constructed to maintain the consistency of the distribution, thus preventing multiple generators from covering a certain mode. Extensive experiments are conducted on two classes of datasets, and the results visually and quantitatively demonstrate the preferable capability of the proposed model for reducing mode collapse and enhancing feature representation.
Read full abstract