This paper proposes an automated method for creating semantic digital building models using dense point clouds and images. The method employs a hybrid bottom-up, top-down approach, integrating artificial intelligence capabilities in scene understanding with domain engineering knowledge to overcome challenges in indoor 3D reconstruction. The pre-trained PointTransformer semantic segmentation model extracts thirteen building objects, where the wall and ceiling segments are utilized in a 3D space parsing algorithm. The parameterized floor plan map is then generated using a data-driven approach, enabling the creation of an extruded volumetric digital model. Additionally, the YOLOV8 object detection network recognizes doors and windows in images derived from projected points of the wall instances. The validation results for six building datasets with different layouts showcase the effectiveness of the proposed model reconstruction algorithm, with a mean error of about 7 cm between the parameters of elements in digital reference models and reconstructed models. This highlights AI’s potential in automating the creation of digital models for the real world.