Consider the problem that handcrafted features are limited by not being directly applicable to practical problems. Additionally, the deep convolution feature is a high-dimensional feature, and if it is directly used to match the image, it will consume considerable time and memory. Moreover, features from higher levels may be contaminated by dramatic variations in the human pose or background clutter. This paper proposes a method based on deep multi-feature distance metric learning. First, each spatial position and channel are weighted after extracting the deep convolution feature from the last layer of the CNN, and the final aggregation result, that is, the feature of the image, is obtained by sum-pooling. Second, a new method to improve and integrate the convolution feature of the region is proposed. The convolution features are processed by the sliding frame technique, and the low-dimensional eigenvector with dimensions equal to the number of convolution layer channels is obtained. Third, a distance learning algorithm is proposed by cross-view quadratic discriminant analysis metric learning. Finally, the weighted fusion strategy is used to accomplish the collaboration between the handcrafted and deep convolution features. On the Market-1501 and VIPeR datasets, the experimental results show that the rank 1 values of the proposed method on three experimental datasets reach 90.02% and 68.74%, respectively. Under the new classification rules of the CHUK03 dataset, the rank 1 performance of the proposed method reaches 34.2%. The experimental results show that the accuracy of pedestrian re-identification after distance-weighted fusion is higher than that obtained by the separate feature distance metric.