Research on Chinese segmentation algorithm based on Hadoop cloud platform

Hong Chen

doi:10.2991/itoec-15.2015.29

Abstract

IKAnalyzer (IK) and ICTCLAS (IC) are very popular Chinese word segmentation algorithms and play an important role in solving text data in a stand-alone environment. In this paper, we compare IK and IC algorithm performance through theory and experiments that reported on experimental work on the mass Chinese text segmentation problem and its optimal solution using the Hadoop cluster, Hadoop Distributed File System (HDFS) for storage and by using parallel processing to process large data sets by using the MapReduce programming framework. The results obtained from various experiments indicate favorable results of above optimized IC and IK algorithms to address mass Chinese text segmentation problems. At the same time, in order to make the large data set after processing is more easily and directly showed, we introduced the Inverted descending order on the segmentation of word frequency in this paper. Through a comparative study of the two kinds of Chinese segmentation algorithm based on Hadoop platform, provides the powerful support for the efficient processing of Chinese mass information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Research on Chinese segmentation algorithm based on Hadoop cloud platform

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Comparison of Approaches to Chinese Word Segmentation in Hadoop
Zhangang Wang ... Bangjie Meng
-
Zhangang Wang, et. al.Zhangang Wang ... Bangjie Meng
01 Dec 2014
01 Dec 2014

Addressing big data problem using Hadoop and Map Reduce
Aditya B Patel ... Manashvi Birla
-
Aditya B Patel, et. al.Aditya B Patel ... Manashvi Birla
01 Dec 2012
01 Dec 2012

Mining Web data for Chinese segmentation
Fu Lee Wang ... Christopher C Yang
Journal of the American Society for Information Science and Technology | VOL. 58
Fu Lee Wang, et. al.Fu Lee Wang ... Christopher C Yang
17 Aug 2007
Journal of the American Society for Information Science and Technology | VOL. 58

Enhancing Availability and Reliability of Cloud Data through Syncopy
Tsozen Yeh ... Huichen Lee
-
Tsozen Yeh, et. al.Tsozen Yeh ... Huichen Lee
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on Chinese segmentation algorithm based on Hadoop cloud platform

Abstract

Talk to us

Similar Papers