AbstractThis article proposes a swarm intelligence ensemble deep transfer learning (named SI‐EDTL) for multiple vehicle detection in unmanned aerial vehicle (UAV) images. This method is based on Faster regional‐based convolutional neural networks (Faster R‐CNN), in which, a set of region proposals are extracted using region proposal network (RPN), and then, CNN is used to mine highly descriptive features of these windows to classify regions. We use three Faster R‐CNNs as feature extractors (InceptionV3, ResNet50, and GoogLeNet) that have already pre‐trained on ImageNet data, combined with five transfer classifiers (KNN, SVM, MLP, C4.5 Decision Tree, and Naïve Bayes). As a result, 15 different base learners are trained through deep transfer learning on a UAV dataset to classify the region proposals into multiple vehicles (car, van, truck, and bus). We combine these 15 base learners through a weighted averaging aggregation into four vehicle classes or no vehicle (background). Hyperparameters of the ensemble model are tuned using whale optimization algorithm, to achieve the best trade‐off between total accuracy, precision, and recall. The proposed SI‐EDTL model has been successfully developed using parallel processing in MATLAB R2020b. Experimental results on AU‐AIR dataset of UAV images demonstrate the superiority of the SI‐EDTL model against existing techniques.