The focus of this manuscript is on integrating optical images and laser point clouds carried on low-cost UAVs to create an automated system capable of generating urban city models. After pre-processing both datasets, we co-registered both datasets using the DLT transformation model. We estimated structure heights from the LiDAR dataset through a progressive morphological filter followed by removing bare ground. Unsupervised and supervised image classification techniques were applied to a six-band image created from the optical and LiDAR datasets. After finding building footprints, we traced their edges, outlined their borderlines, and identified their geometric boundaries through several image processing and rule-based feature identification algorithms. Comparison between manually digitized and automatically extracted buildings showed a detection rate of about 92.3 % with an average of 7.4 % falsely identified areas with the six-band image in contrast to classifying only the RGB image that detected about 63.2 % of the building pixels with 25.3 % pixels incorrectly identified. Moreover, our building detection rate with the 6-band image was superior to that attained by performing traditional image segmentation for only the LiDAR DEM. Shifts in the horizontal coordinates between corner points identified by a human operator and those detected by the proposed system were in the range of 10–15 cm. This is an improvement over traditional satellite and manned-aerial large mapping systems that have lower accuracies due to sensor limitations and platform altitude. These findings demonstrate the benefits of fusing multiple UAV remote sensing datasets over utilizing a single dataset for urban area mapping and 3D city modeling.