This work presents a numerical mesh generation method for 3D urban scenes that could be easily converted into any 3D format, different from most implementations which are limited to specific environments in their applicability. The building models have shaped roofs and faces with static colors, combining the buildings with a ground grid. The building generation uses geographic positions and shape names, which can be extracted from OpenStreetMap. Additional steps, like a computer vision method, can be integrated into the generation optionally to improve the quality of the model, although this is highly time-consuming. Its function is to classify unknown roof shapes from satellite images with adequate resolution. The generation can also use custom geographic information. This aspect was tested using information created by procedural processes. The method was validated by results generated for many realistic scenarios with multiple building entities, comparing the results between using computer vision and not. The generated models were attempted to be rendered under Graphics Library Transmission Format and Unity Engine. In future work, a polygon-covering algorithm needs to be completed to process the building footprints more effectively, and a solution is required for the missing height values in OpenStreetMap.