ABSTRACT Individual tree detection in urban areas using unmanned air vehicles (UAVs) RGB imagery poses challenges due to the diverse shapes and structures of urban trees and the complexity of urban forests. The digital surface model (DSM) provides elevation data, and the fusion of UAV RGB imagery with elevation data has emerged as a promising approach for tree detection. Here, we constructed a novel network structure based on the faster region-based convolutional neural network (Faster R-CNN) to detect camphor trees in urban environments using RGB-DSM data. First, an attention fusion module was proposed to effectively fuse the RGB and DSM features by leveraging their complementarity. Second, the bidirectional feature pyramid network (BiFPN) was introduced to enhance the model performance in detecting camphor tree crowns of varying sizes. The results showed that our approach could effectively detect urban camphor trees and achieved an AP of 85.7%. Notably, our approach yielded an AP of 81.3% in urban green spaces. The analysis indicated that our approach was feasible for detecting camphor trees in urban areas and demonstrated its potential to facilitate urban forestry research and applications.