Abstract

Power industry technical standard retrieval is an information retrieval problem that aims to search relevant candidate documents according to query phrases or sentences. How to correctly match queries to the corpus and utilize term frequency and semantic features is critical for our task. In this work, we propose a novel solution to these problems based on term frequency (TF) and semantic similarity (SS) matching. Specifically, it consists of three key components: (1) TF matching aims to mine the significance of a particular term within the overall document while filtering out the content without the queries. This way is suitable for accurate retrieval. (2) SS matching is formulated into a deep metric learning framework to jointly learn the sentence representations and semantical embedding metric. (3) Rerank the fusion results from the mentioned TF and SS models to refine the output ranking list. Extensive experiments on the dataset, which contains 38 frequently used technical standards in daily work, prove the effectiveness of our proposed method. Our SS matching achieves comparable results with complex polyencoders. Combined with TF matching, our TF-SS matching can achieve state-of-the-art performance with 80.84% recall, 84% MRR, and 79.82% ROUGE.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.