TF-SS Matching: Retrieval of Power Industry Technical Standards via Term Frequency and Semantic Similarity Matching

Han Sun,Zhantao Su,Jing Wang,Kunlun Li,Xin Zhang,Jun Liao

doi:10.1109/icpds54746.2021.9690073

Abstract

Power industry technical standard retrieval is an information retrieval problem that aims to search relevant candidate documents according to query phrases or sentences. How to correctly match queries to the corpus and utilize term frequency and semantic features is critical for our task. In this work, we propose a novel solution to these problems based on term frequency (TF) and semantic similarity (SS) matching. Specifically, it consists of three key components: (1) TF matching aims to mine the significance of a particular term within the overall document while filtering out the content without the queries. This way is suitable for accurate retrieval. (2) SS matching is formulated into a deep metric learning framework to jointly learn the sentence representations and semantical embedding metric. (3) Rerank the fusion results from the mentioned TF and SS models to refine the output ranking list. Extensive experiments on the dataset, which contains 38 frequently used technical standards in daily work, prove the effectiveness of our proposed method. Our SS matching achieves comparable results with complex polyencoders. Combined with TF matching, our TF-SS matching can achieve state-of-the-art performance with 80.84% recall, 84% MRR, and 79.82% ROUGE.

Full Text