The information of urban tree species resources is of vital significance to the planning and design of urban green spaces. Tree organs, such as the bark are used as the primary features of identifying tree species. However, traditional tree identification methods need to consume a lot of manpower and time costs. In addition, the application of machine image recognition technology to tree species recognition still has problems such as heavy data preprocessing workload, small number of tree species images, uneven distribution of categories, and low recognition accuracy. In order to promote the intelligent management of urban forestry and solve the above problems, it is necessary to establish an automatic image recognition model for urban greening tree species. We captured bark images of 21 urban afforestation tree species in their natural environment and constructed a dataset that was divided into a train set, validation set, and test set in the ratio of 7:1:2. Combining Channel Attention Module (CAM) with algorithms such as Spatial Pyramid Pooling (SPP) and Mixed Depthwise Dilated Convolutional Kernels. The core algorithm is Mixed Convolution Kernel (MK), and a CAMP-MKNet Convolutional Neural Network (CNN) is constructed as a bark image classification model for urban greening tree species. The overall accuracy of the generic models ranged from 41.06% to 82.03%, whereas the overall accuracy of the experimental CAMP-MKNet model was 84.25%, with lower prediction cost. Our study shows that the CAMP-MKNet CNN model with better prediction performance and computational cost and can provide crucial insights and technical support for developing automated urban tree species image recognition systems.