Optical remote sensing images (ORSI) have various applications in different fields, and salient target detection (ORSI-SOD) of ORSI has become an important research topic in recent years. However, ORSI-SOD is a challenging problem due to the variable and complex backgrounds, large differences in levels, mixed backgrounds, and diverse topological shapes of ORSI. In this paper, we propose a novel model called multi-scale feature refinement aggregation network (MFANet), which consists of a multi-scale feature refinement module (MFR) and a context aggregation module (CFA). The MFR module extracts semantic information of ORSI across different dimensions in the multi-scale feature extraction stage. In the feature refinement stage, we use the proposed self-refinement module under the guidance of attention and reverse attention to progressively refine the prediction results. The CFA module introduces the hybrid attention module to gradually aggregate and extract salient regions from the context extraction module. To adapt to dense scenes, we develop a hybrid loss function that enables network optimization of multi-scale objectives in a self-adaptive manner. Our method outperforms most state-of-the-art salient object detection methods proposed in recent years in terms of accuracy.