Abstract

AbstractIn damage‐level classification, deep learning. models are more likely to focus on regions unrelated to classification targets because of the complexities inherent in real data, such as the diversity of damages (e.g., crack, efflorescence, and corrosion). This causes performance degradation. To solve this problem, it is necessary to handle data complexity and uncertainty. This study proposes a multimodal deep learning model that can focus on damaged regions using text data related to damage in images, such as materials and components. Furthermore, by adjusting the effect of attention maps on damage‐level classification performance based on the confidence calculated when estimating these maps, the proposed method realizes an accurate damage‐level classification. Our contribution is the development of a model with an end‐to‐end multimodal attention mechanism that can simultaneously consider both text and image data and the confidence of the attention map. Finally, experiments using real images validate the effectiveness of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.