Recent advances in camera-equipped drone applications increased the demand for visual object detection algorithms with deep learning for aerial images. There are several limitations in accuracy for a single deep learning model. Inspired by ensemble learning can significantly improve the generalization ability of the model in the machine learning field, we introduce a novel integration strategy to combine the inference results of two different methods without non-maximum suppression. In this paper, a global and local ensemble network (GLE-Net) was proposed to increase the quality of predictions by considering the global weights for different models and adjusting the local weights for bounding boxes. Specifically, the global module assigns different weights to models. In the local module, we group the bounding boxes that corresponding to the same object as a cluster. Each cluster generates a final predict box and assigns the highest score in the cluster as the score of the final predict box. Experiments on benchmarks VisDrone2019 show promising performance of GLE-Net compared with the baseline network.