The Loess Plateau in China is renowned for its dense gullies and complex terrain, with drastic changes primarily due to soil erosion and human activities, significantly affecting the evolution of the ecological environment. The complex terrains and dense vegetation make precise terrain measurement and modeling challenging. Although the development of Unmanned Aerial Vehicle (UAV) light detection and ranging (LiDAR) scanning and photogrammetry technologies has improved data acquisition precision, relying solely on one remote sensing technology struggles with accurately extracting bare earth information. This study adopted a method that fuses UAV lidar scanning with aerial photogrammetric imagery, generating detailed lidar point cloud data that includes coordinate, reflectance, true color, and texture information to enhance data classifiability and interpretability. Subsequently, a point cloud classification model based on the Transformer architecture (Stratified Transformer) is introduced to intelligently complete the initial ground point cloud extraction in complex gully terrains. Further, to address residual non-ground noise in the initial ground point clouds, a new point cloud classification optimization algorithm (MDD, Multi-scale C2M Distance Difference) is proposed. This algorithm, based on the characteristics of discrete and non-continuous with the ground surface of the noisy point clouds, effectively eliminates the discrete noisy point clouds by analyzing the distances between the point clouds and TINs (Triangular Irregular Networks) of different scales and their differences. This study effectively addresses the technical challenges of ground point cloud extraction in the mixed environment of complex terrain and vegetation, solving the problem of precise terrain measurement and intelligent data processing in complex gully terrains, and offering new technical pathways for detecting geomorphological changes.