Fine-grained vehicle model recognition is a challenging problem in intelligent transportation systems due to the subtle intra-category appearance variation. In this paper, we demonstrate that this problem can be addressed by locating discriminative parts, where the most significant appearance variation appears, based on the large-scale training set. We also propose a corresponding coarse-to-fine method to achieve this, in which these discriminative regions are detected automatically based on feature maps extracted by convolutional neural network. A mapping from feature maps to the input image is established to locate the regions, and these regions are repeatedly refined until there are no more qualified ones. The global and local features are then extracted from the whole vehicle images and the detected regions, respectively. Based upon the holistic cues and the subordinate-level variation within these global and local features, an one-versus-all support vector machine classifier is applied for classification. The experimental results show that our framework outperforms most of the state-of-the-art approaches, achieving 98.29% accuracy over 281 vehicle makes and models.