A probabilistic topic model (PTM) combined with the bag-of-visual-words model is a common method to bridge the so-called “semantic gap” problem in remote-sensing image classification research. Owing to the inherent shortcomings of PTMs, such as time consumption and failures to consider a spatial arrangement of various objects, we introduce a natural language processing document-to-vector (Doc2Vec) model, to capture the high-level semantic information of the images, instead of a PTM. The model characterizes words and documents as dense, low-dimensional vectors and implements a simplified, shallow neural network to train a language model and word vectors. It is expected to mine semantic information of remote-sensing images from a new perspective. We also improve the low-level feature quality by using feature-specific sampling methods. Two high-spatial resolution remote-sensing image datasets, UC Merced and RSSCN7, are employed to conduct a scene classification experiment to discuss the performance of the Doc2Vec model. The experimental results show that the Doc2Vec model is highly efficient in mining semantic information of the images, compared with the state-of-the-art methods.