SIFRANK Algorithm for Chinese Text Keyword Extraction Based on Dependent Semantic Feature Constraints

Qian Zhang,Yunwei Zhang,Mengyuan Zhu,Tao Shen,Tiancheng Wang,Yilin Zhao

doi:10.1109/iciea54703.2022.10006207

Abstract

The SIFRANK algorithm uses part-of-speech TAG to extract noun phrases as candidate sets for keyword extraction, and then uses the pre-trained ELMo model to obtain the semantic vectors of the current sentence and candidate phrases respectively, and measures the semantic similarity of the two to obtain the final keyword extraction results. However, it is difficult to set the regular expression of the part of speech(POS) TAG to a suitable formula. If it is too strict, the candidate set will be missing and the expected candidate words cannot be obtained. If the compound part of speech constraints such as gerunds are selected, unreasonable compound words will be obtained, resulting in poor final results. Therefore, this paper proposes a keyword extraction method DS-SIFRANK which increases the constraints of dependent semantic features. On the basis of SIFRANK, the restriction of part-of-speech TAG is enlarged, the dependency relationship is increased, and the candidate words are re-segmented and dependency syntactic parsing is performed to reduce the influence of inaccurate compound words on the extraction results. The experimental results show that the F1 index of the DS-SIFRANK method on the test set reaches 0.4872, which is 0.096 higher than that of the SIFRANK method.

Full Text