ABSTRACT Detecting geospatial objects, especially small, time-sensitive targets such as airplanes and ships in cluttered scenes, is a substantial challenge in large-scale, high-resolution optical satellite images. Directly detecting targets in countless image blocks results in higher false alarms and is also inefficient. In this paper, we introduce a hierarchical architecture to quickly locate related areas and detect these targets effectively. In the coarse layer, we use an improved saliency detection model that utilizes geospatial priors and multi-level saliency features to probe suspected regions in broad and complicated remote sensing images. Then, in the fine layer of each region, an efficacious end-to-end neural network that predicts the categories and locations of the objects is adopted. To improve the detection performance, an enhanced network, adaptive multi-scale anchors, and an improved loss function are designed to overcome the great diversity and complexity of backgrounds and targets. The experimental results obtained for both a public dataset and our collected images validated the effectiveness of our proposed method. In particular, for large-scale images (more than 500 km2), the adopted method far surpasses most unsupervised saliency models in terms of the performance in region saliency detection and can quickly detect targets within 1 minute, with 95.0% recall and 93.2% precision rates on average.