Research on the Extraction Technology of Hot-words in Tibetan WebPages

Chang-Zhi Wang,Hui Wang,Gui-Xian Xu,T Gong,T Yang,J Xu

doi:10.1051/itmconf/20160701005

Research on the Extraction Technology of Hot-words in Tibetan WebPages

Chang-Zhi Wang, Hui Wang + Show 4 more

Open Access

https://doi.org/10.1051/itmconf/20160701005

Copy DOI

Journal: ITM Web of Conferences	Publication Date: Jan 1, 2016
License type: cc-by

Affiliation: Minzu University of China

#Hot Words #Construction Of Corpus + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The construction of Tibetan corpus is the field of Tibetan information processing of basic work. This paper uses the technology of web crawler and pretreatment and real-time acquisition of web sites to obtain a large number of Tibetan corpus in short time. The hot words reflected the hotspot of Tibetan people’s attention in a certain period of time. The paper draws lessons from the TFIDF for Tibetan text information extraction and the words of different locations are given different weights to extract the hot words. It is really effective to realize the construction of the raw Tibetan corpus and the extraction of the hot-words by self-made software.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: ITM Web of Conferences

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.