Precise and complete 3D representations of architectural structures or industrial sites are essential for various applications, including structural monitoring or cadastre. However, acquiring these datasets can be time-consuming, particularly for large objects. Mobile scanning systems offer a solution for such cases. In the case of complex scenes, multiple scanning systems are required to obtain point clouds that can be merged into a comprehensive representation of the object. Merging individual point clouds obtained from different sensors or at different times can be difficult due to discrepancies caused by moving objects or changes in the scene over time, such as seasonal variations in vegetation. In this study, we present the integration of point clouds obtained from two mobile scanning platforms within a built-up area. We utilized a combination of a quadruped robot and an unmanned aerial vehicle (UAV). The PointNet++ network was employed to conduct a semantic segmentation task, enabling the detection of non-ground objects. The experimental tests used the Toronto 3D dataset and DALES for network training. Based on the performance, the model trained on DALES was chosen for further research. The proposed integration algorithm involved semantic segmentation of both point clouds, dividing them into square subregions, and performing subregion selection by checking the emptiness or when both subregions contained points. Parameters such as local density, centroids, coverage, and Euclidean distance were evaluated. Point cloud merging and augmentation enhanced with semantic segmentation and clustering resulted in the exclusion of points associated with these movable objects from the point clouds. The comparative analysis of the method and simple merging was performed based on file size, number of points, mean roughness, and noise estimation. The proposed method provided adequate results with the improvement of point cloud quality indicators.