Abstract

Subject words represent the brief information of the text. Text automatic summary reflects its theme and core content. In this paper, the research is conducted on multi-feature fusion algorithm on subject words extraction and summary generation of Tibetan network text. Firstly, Tibetan web pages are collected and preprocessing is conducted to extract the useful information from web pages. Secondly, BCCF algorithm of word segmentation is utilized to cut the text’s words. Then multi-feature fusion algorithm is proposed to extract the subject words of the text. The algorithm takes into account the multi-factors such as the word’s frequency, length, type to calculate the words’ weight and effectively select the text’s subject words. For text summary generation, the algorithm of the sentence weight calculation is designed in terms of the word frequency, position and so on. The algorithm of text summary generation is to compute the sentences’ weight, remove the redundant sentences and form the text summary. The experiments show that multi-feature fusion algorithm of the subject words extraction and the summary generation have reached the better achievement. The research is useful and helpful to the study of Tibetan information processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call