The aim of this study was to develop a deep neural network for respiratory motion compensation in free-breathing cine MRI and evaluate its performance. An adversarial autoencoder network was trained using unpaired training data from healthy volunteers and patients who underwent clinically indicated cardiac MRI examinations. A U-net structure was used for the encoder and decoder parts of the network and the code space was regularized by an adversarial objective. The autoencoder learns the identity map for the free-breathing motion-corrupted images and preserves the structural content of the images, while the discriminator, which interacts with the output of the encoder, forces the encoder to remove motion artifacts. The network was first evaluated based on data that were artificially corrupted with simulated rigid motion with regard to motion-correction accuracy and the presence of any artificially created structures. Subsequently, to demonstrate the feasibility of the proposed approach in vivo, our network was trained on respiratory motion-corrupted images in an unpaired manner and was tested on volunteer and patient data. In the simulation study, mean structural similarity index scores for the synthesized motion-corrupted images and motion-corrected images were 0.76 and 0.93 (out of 1), respectively. The proposed method increased the Tenengrad focus measure of the motion-corrupted images by 12% in the simulation study and by 7% in the in vivo study. The average overall subjective image quality scores for the motion-corrupted images, motion-corrected images and breath-held images were 2.5, 3.5 and 4.1 (out of 5.0), respectively. Nonparametric-paired comparisons showed that there was significant difference between the image quality scores of the motion-corrupted and breath-held images (P < .05); however, after correction there was no significant difference between the image quality scores of the motion-corrected and breath-held images. This feasibility study demonstrates the potential of an adversarial autoencoder network for correcting respiratory motion-related image artifacts without requiring paired data.