Collection of Tibetan Network

Chang-Zhi Wang,Hui Wang,Guixian Xu

doi:10.12783/dtcse/cmsam2016/3628

Collection of Tibetan Network

Chang-Zhi Wang, Hui Wang + Show 1 more

Open Access

https://doi.org/10.12783/dtcse/cmsam2016/3628

Copy DOI

Journal: DEStech Transactions on Computer Science and Engineering

Publication Date: Nov 17, 2016

#Tibetan Web #Tibetan Information + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

With the development of Tibetan information technology, technologies about Tibetan web crawlers was extremely important. We elaborate different pages pretreatment rules according to the different sites and make the collected Tibetan Web text dump for Tibetan documents, by constructing a Web crawler to crawl different Tibetan websites, Experiments show that it can quickly and effectively to build large-scale Tibetan corpus, build the foundations for Tibetan information processing technology by self-made software and the module of pretreatment.

Full Text