Object detection is a basic issue of very high-resolution remote sensing images (RSIs) for automatically labeling objects. At present, deep learning has gradually gained the competitive advantage for remote sensing object detection, especially based on convolutional neural networks (CNNs). Most of the existing methods use the global information in the fully connected feature vector and ignore the local information in the convolutional feature cubes. However, the local information can provide spatial information, which is helpful for accurate localization. In addition, there are variable factors, such as rotation and scaling, which affect the object detection accuracy in RSIs. In order to solve these problems, this paper presents a hierarchical robust CNN. First, multiscale convolutional features are extracted to represent the hierarchical spatial semantic information. Second, multiple fully connected layer features are stacked together so as to improve the rotation and scaling robustness. Experiments on two data sets have shown the effectiveness of our method. In addition, a large-scale high-resolution remote sensing object detection data set is established to make up for the current situation that the existing data set is insufficient or too small. The data set is available at https://github.com/CrazyStoneonRoad/TGRS-HRRSD-Dataset .