Abstract

Floor plan analysis and vectorization are of practical importance in real estate and interior design fields. The analysis usually serves as a preliminary to the vectorization by extracting structural elements and room layouts. However, existing analysis methods mainly focus on the visual modality, which is insufficient for identifying rooms due to the lack of semantic clues about room types. On the other hand, standard floor plan images have rich textual annotations that provide semantic guidance of room layouts. Motivated by this fact, we propose a multimodal segmentation network (OCR) $$^2$$ that exploits additional textual information for the analysis of floor plan images. Specifically, we extract texts that indicate the room layouts with optical character recognition (OCR) and fuse them with visual features by a cross-attention mechanism. Thereafter, we further optimize the state-of-the-art vectorization method in efficiency by (1) replacing the gradient-descent steps with the fast principle components analysis (PCA) to convert doors and windows, and (2) removing the unnecessary iterative steps when extracting room contours. Both quantitative and qualitative experiments validate the effectiveness and efficiency of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call