In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker-specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data—pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: 1) the data used for the training are limited to the predefined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatches in alignment may occur. Although it is, thus, fairly preferable in VC not to use parallel data, a nonparallel approach is considered difficult to learn. In our approach, we achieve nonparallel training based on a speaker adaptation technique and capturing latent phonological information. This approach assumes that speech signals are produced from a restricted Boltzmann machine-based probabilistic model, where phonological information and speaker-related information are defined explicitly. Speaker-independent and speaker-dependent parameters are simultaneously trained under speaker adaptive training. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach outperformed another nonparallel approach, and produced results similar to those of the popular conventional Gaussian mixture models-based method that used parallel data in subjective and objective criteria.