Most 3D CityGML building models in street-view maps (e.g., Google, Baidu) lack texture information, which is generally used to reconstruct real-scene 3D models by photogrammetric techniques, such as unmanned aerial vehicle (UAV) mapping. However, due to its simplified building model and inaccurate location information, the commonly used photogrammetric method using a single data source cannot satisfy the requirement of texture mapping for the CityGML building model. Furthermore, a single data source usually suffers from several problems, such as object occlusion. We proposed a novel approach to achieve CityGML building model texture mapping by multiview coplanar extraction from UAV remotely sensed or terrestrial images to alleviate these problems. We utilized a deep convolutional neural network to filter out object occlusion (e.g., pedestrians, vehicles, and trees) and obtain building-texture distribution. Point-line-based features are extracted to characterize multiview coplanar textures in 2D space under the constraint of a homography matrix, and geometric topology is subsequently conducted to optimize the boundary of textures by using a strategy combining Hough-transform and iterative least-squares methods. Experimental results show that the proposed approach enables texture mapping for building façades to use 2D terrestrial images without the requirement of exterior orientation information; that is, different from the photogrammetric method, a collinear equation is not an essential part to capture texture information. In addition, the proposed approach can significantly eliminate blurred and distorted textures of building models, so it is suitable for automatic and rapid texture updates.