The circumpolar Taiga–Tundra Ecotone significantly influences the feedback mechanism of global climate change. Achieving large-scale individual tree crown (ITC) extraction in the transition zone is crucial for estimating vegetation biomass in the transition zone and studying plants’ response to climate change. This study employed aerial images and airborne LiDAR data covering several typical transitional zone regions in northern Finland to explore the ITC delineation method based on deep learning. First, this study developed an improved multi-scale ITC delineation method to enable the semi-automatic assembly of the ITC sample collection. This approach led to the creation of an individual tree dataset containing over 20,000 trees in the transitional zone. Then, this study explored the ITC delineation method using the Mask R-CNN model. The accuracies of the Mask R-CNN model were compared with two traditional ITC delineation methods: the improved multi-scale ITC delineation method and the local maxima clustering method based on point cloud distribution. For trees with a height greater than 1.3 m, the Mask R-CNN model achieved an overall recall rate (Ar) of 96.60%. Compared to the two conventional ITC delineation methods, the Ar of Mask R-CNN showed an increase of 1.99 and 5.52 points in percentage, respectively, indicating that the Mask R-CNN model can significantly improve the accuracy of ITC delineation. These results highlight the potential of Mask R-CNN in extracting low trees with relatively small crowns in transitional zones using high-resolution aerial imagery and low-density airborne point cloud data for the first time.