Interpretation of large-scale aerial imagery is an essential challenge due to the dense distribution of objects and complex intra-class backgrounds. Non-local relational modeling has been adopted as the mainstream solution to better interpret such large-scale scenes. Existing non-local relations modeling methods are based on building deep convolution neural networks, attention mechanisms, or deep graph neural networks for all positions of the whole image. However, the existing methods are computationally expensive and inefficient. In this paper, we dig into the main causes of such inefficiency and find two main disadvantages of existing methods, i.e., (1) semantic relations are blurry and homogeneous, (2) insufficient construction of high-order relations. To overcome the above issues, we analyze the inadequacy of traditional hypergraph definition and propose a lightweight hypergraph construction strategy to learn the non-local relations. The proposed method can model high-order relations more effectively and lead to an explicit representation of semantic information. We apply this strategy in the spatial dimension and construct a fully-weighted hypergraph neural network (HGNN) to capture short- and long-range dependencies in such a large-scale aerial image. Furthermore, we design a hypergraph convolutional feature pyramid network (Hyper-FPN), which learns the non-local relations in multi-scale features and then aggregates hierarchical global contexts. Extensive experiments on geospatial visual recognition demonstrate that Hyper-FPN significantly improves the performance under our strategy. Moreover, our approach can be easily embedded into state-of-the-art (SOTA) architectures to achieve higher performance.