Chinese Web Text Outlier Mining Based on Domain Knowledge

Xia Huosong,Fan Zhaoyan,Peng Liuyan

doi:10.1109/gcis.2010.66

Abstract

Web text mining is a growing research area in data mining. Interestingly, the existing Web text mining algorithms have concentrated on finding frequent patterns while discarding the less frequent ones that may contain outliers. In addition, the domain knowledge in one industry is partly different from that in the others. Whatever they belong to, web texts are analyzed using the same dictionary. This paper proposes formal definitions of Web text outliers and Web text outlier mining, and presents a framework of Web text outlier mining based on domain knowledge. To verify the feasibility of the framework, an algorithm for mining Chinese Web text outliers is proposed based on improved VSM and n-grams. Experimental results with insurance topic show that the mining algorithm is effectively capable of finding Chinese Web text outliers from web text data, and has higher precision and recall and lower complexity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chinese Web Text Outlier Mining Based on Domain Knowledge

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Greenwashing in the US metal industry? A novel approach combining SO2 concentrations from satellite data, a plant-level firm database and web text mining
Sebastian Schmidt ... Bernd Resch
Science of The Total Environment | VOL. 835
Sebastian Schmidt, et. al.Sebastian Schmidt ... Bernd Resch
27 Apr 2022
Science of The Total Environment | VOL. 835

Chinese Web Text Classification Model Based on Manifold Learning
Shengli Shi ... Zhibin Fu
-
Shengli Shi, et. al.Shengli Shi ... Zhibin Fu
01 Jan 2012
01 Jan 2012

Research and Implement of Classification Algorithm on Web Text Mining
Shiqun Yin ... Gang Wang
-
Shiqun Yin, et. al.Shiqun Yin ... Gang Wang
01 Oct 2007
01 Oct 2007

Research and Implement of Classification Algorithm on Web Text Mining
Shiqun Yin ... Weiqun Zhang
-
Shiqun Yin, et. al.Shiqun Yin ... Weiqun Zhang
01 Oct 2007
01 Oct 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chinese Web Text Outlier Mining Based on Domain Knowledge

Abstract

Talk to us

Similar Papers