Most research on forest tree species classification based on optical image data uses information such as spectral reflectance, vegetation index, texture, and phenology data. However, owing to the limited spectral resolution of multispectral images and the high cost of hyperspectral data, there is room for improvement in the classification of tree species in large areas based on optical images. The combined application of multispectral images and other auxiliary data can provide a new method for improving tree species classification accuracy. Hence, Sentinel-2 images were used to extract spectral reflectance, spectral index, texture, and phenological information. Data for topography, precipitation, air temperature, ultraviolet aerosol index, NO2 concentration, and other variables were included as auxiliary data. Models for forest tree species classification were constructed through feature combination and feature optimization using the random forest (RF), gradient tree boost (GTB), support vector machine (SVM), and classification and regression tree (CART) algorithms. The classification results of 16 feature combinations with the 4 classification methods were compared, and the contributions of different features to the classification models of forest tree species were evaluated. Finally, the optimal classification model was selected to identify the spatial distribution of forest tree species in the study area. The model based on feature optimization gave the best results among the 16 feature combination models. The overall accuracy and kappa coefficient were increased by 18% and 0.21, respectively, compared with the spectral classification model, and by 17% and 0.20, respectively, compared with the spectral and spectral index classification model. By analyzing the feature optimization model, it was found that terrain, ultraviolet aerosol index, and phenological information ranked as the top three features in terms of importance. Although the importance of spectral reflectance and spectral index features was lower, the number of feature variables accounted for a large proportion of the total. The importance of commonly used texture features was limited, and these features were not present in the feature optimization model. The RF algorithm had the highest classification accuracy, with an overall accuracy of 82.69% and a kappa coefficient of 0.80, among the four classification algorithms. The results of GTB were close to those of RF, and the difference in overall classification accuracy was only 0.14%. However, the results of the SVM and CART algorithms were relatively weaker, with overall classification accuracies of about 70%. It can be concluded that the combined application of Sentinel-2 images and auxiliary data can improve forest tree species classification accuracy. The model based on feature optimization achieved the highest classification accuracy among the 16 feature combination models. The spectral reflectance and spectral index data extracted from optical images are useful for tree species classification, but the effect of texture features was very limited. Auxiliary data, such as topographic features, ultraviolet aerosol index, phenological features, NO2 concentration features, topographic diversity features, precipitation features, temperature features, and multi-scale topographic location index data, can effectively improve forest tree species classification accuracy. The RF algorithm had the highest accuracy, and it can be used for tree species classification space distribution identification. The combined application of Sentinel-2 images and auxiliary data can improve classification accuracy, but the highest accuracy of the model was only 82.69%, which leaves room for improvement. Thus, more effective auxiliary data and the vertical structural parameters extracted from satellite LiDAR can be combined with multispectral images to improve forest tree species classification accuracy in future research.