Intelligent bridge maintenance requires the comprehensive utilization of inspection records, as they contain valuable insights into structures’ long-term service conditions. To efficiently focus and utilize this data from extensive reports, reliable extraction methods are highly sought after. However, the heavy reliance on large amounts of manually annotated data limits the applicability and practicability of existing information extraction methods from bridge inspection reports, especially given the non-uniformity in report formats and expressions. To address this issue, this study proposed a few-shot information extraction model for bridge inspection reports understanding, comprising Word and Structure Integration Embeddings, Bi-directional Long Short-Term Memory (BiLSTM), and Condition Random Field (CRF). The model’s strength lies in its easy-to-implement word-structure embedding approach, which combines domain-specific word representations and sentence structure information. Specifically, the bridge inspection-domain pre-trained language model was further pre-trained and fine-tuned to obtain word embeddings, containing prior knowledge of the domain and tasks. Moreover, a novel encoding method was designed to generate sentence structure embeddings from dependency syntactic analysis results, providing textual representation information. Finally, the integrated word-structure embeddings, created by aligning dimensions for concatenation, were fed into the BiLSTM-CRF architecture to capture contextual dependencies and constrain extraction results. Empirical evaluations conducted on four few-shot datasets with 10, 30, 50, and 100 samples demonstrate that the proposed model achieved high accuracy and F1 score, outperforming prior methods, general domain models, and large language models. Specifically, in a dataset containing 50 sentences, our model achieved an accuracy of up to 0.9357 and an F1 score of 0.8683, representing an average increase of 38.4% higher than these methods. Ablation experiments revealed the contributions of each model component. These results suggest that the proposed model can accurately extract key information from bridge inspection reports even with limited training data scenarios, thereby facilitating applications such as structural condition evaluation and maintenance decision-making.
Read full abstract