Abstract

Keywords of Tibetan text is important in the area of text text clustering/categorization, automatic abstracting, IR and so on. However, there are no keywords in the Tibtan news WebPages. Besides, many algorithm for keywords extraction need the manually annotated corpus, so it is poor augment ability. Because Keywords can be considered as a set of words which are important and subject correlated cohesively in a document, this paper improved the CHI-Squared Statistic, use the idea of recommendation to extact keywords. Experiments from Tibtan news webpages demonstrate that this method is better than the method of TFIDF integrating with location information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call