Biodiversity is a characteristic of ecosystems that plays a crucial role in the study of their evolution, and to estimate it, the species of all plants need to be determined. In this study, we used Unmanned Aerial Vehicles to gather RGB images of mid-to-high-altitude ecosystems in the Zao mountains (Japan). All the data-collection missions took place in autumn so the plants present distinctive seasonal coloration. Patches from single trees and bushes were manually extracted from the collected orthomosaics. Subsequently, Deep Learning image-classification networks were used to automatically determine the species of each tree or bush and estimate biodiversity. Both Convolutional Neural Networks (CNNs) and Transformer-based models were considered (ResNet, RegNet, ConvNeXt, and SwinTransformer). To measure and estimate biodiversity, we relied on the Gini–Simpson Index, the Shannon–Wiener Index, and Species Richness. We present two separate scenarios for evaluating the readiness of the technology for practical use: the first scenario uses a subset of the data with five species and a testing set that has a very similar percentage of each species to those present in the training set. The models studied reach very high performances with over 99 Accuracy and 98 F1 Score (the harmonic mean of Precision and Recall) for image classification and biodiversity estimates under 1% error. The second scenario uses the full dataset with nine species and large variations in class balance between the training and testing datasets, which is often the case in practical use situations. The results in this case remained fairly high for Accuracy at 90.64% but dropped to 51.77% for F1 Score. The relatively low F1 Score value is partly due to a small number of misclassifications having a disproportionate impact in the final measure, but still, the large difference between the Accuracy and F1 Score highlights the complexity of finely evaluating the classification results of Deep Learning Networks. Even in this very challenging scenario, the biodiversity estimation remained with relatively small (6–14%) errors for the most detailed indices, showcasing the readiness of the technology for practical use.