Using unmanned aerial vehicles (UAVs) for remote sensing has the advantages of high flexibility, convenient operation, low cost, and wide application range. It fills the need for rapid acquisition of high-resolution aerial images in modern photogrammetry applications. Due to the insufficient parallaxes and the computation-intensive process, dense real-time reconstruction for large terrain scenes is a considerable challenge. To address these problems, we proposed a novel SLAM-based MVS (Multi-View-Stereo) approach, which can incrementally generate a dense 3D (three-dimensional) model of the terrain by using the continuous image stream during the flight. The pipeline of the proposed methodology starts with pose estimation based on SLAM algorithm. The tracked frames were then selected by a novel scene-adaptive keyframe selection method to construct a sliding window frame-set. This was followed by depth estimation using a flexible search domain approach, which can improve accuracy without increasing the iterate time or memory consumption. The whole system proposed in this study was implemented on the embedded GPU based on an UAV platform. We proposed a highly parallel and memory-efficient CUDA-based depth computing architecture, enabling the system to achieve good real-time performance. The evaluation experiments were carried out in both simulation and real-world environments. A virtual large terrain scene was built using the Gazebo simulator. The simulated UAV equipped with an RGB-D camera was used to obtain synthetic evaluation datasets, which were divided by flight altitudes (800-, 1000-, 1200 m) and terrain height difference (100-, 200-, 300 m). In addition, the system has been extensively tested on various types of real scenes. Comparison with commercial 3D reconstruction software is carried out to evaluate the precision in real-world data. According to the results on the synthetic datasets, over 93.462% of the estimation with absolute error distance of less then 0.9%. In the real-world dataset captured at 800 m flight height, more than 81.27% of our estimated point cloud are less then 5 m difference with the results of Photoscan. All evaluation experiments show that the proposed approach outperforms the state-of-the-art ones in terms of accuracy and efficiency.