Lexical Semantic SLVM for Semi-structured Document Classification

Luda Wang

doi:10.12733/jics20105225

Abstract

Structured Link Vector Model (SLVM) and its improved model depend on statistical term measures to implement semi-structured document representation. As a result, they ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors. This work proposed a document representation method, WordNet-based lexical semantic SLVM, to solve the problem. Using WordNet, this method constructed a data structure to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, feature matrix for document representation was built in the lexical-semantic feature space of semi-structured document, and applied to NWKNN classification algorithm. On categorized dataset of Wikipedia XML, the experimental results show that the feature matrix of our method performs F1 measure better than original SLVM and frequent sub-tree SLVM based on TF-IDF.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Lexical Semantic SLVM for Semi-structured Document Classification

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Computational Science

Lead the way for us

Similar Papers

Lexical-semantic SLVM for XML Document Classification
Jun Long ... Zude Li
Journal of Software | VOL. -
Jun Long, et. al.Jun Long ... Zude Li
12 Jan 2014
Journal of Software | VOL. -

Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
Daisuke Ikeda ... Sachio Hirokawa
-
Daisuke Ikeda, et. al.Daisuke Ikeda ... Sachio Hirokawa
01 Jan 2001
01 Jan 2001

WordNet-based lexical semantic classification for text corpus analysis
Jun Long ... Liu Yang
Journal of Central South University | VOL. 22
Jun Long, et. al.Jun Long ... Liu Yang
01 May 2015
Journal of Central South University | VOL. 22

WordNet-based hybrid VSM for Document Classification
Luda Wang ... Peng Zhang
International Journal of Database Theory and Application | VOL. 9
Luda Wang, et. al.Luda Wang ... Peng Zhang
31 Jan 2016
International Journal of Database Theory and Application | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lexical Semantic SLVM for Semi-structured Document Classification

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Computational Science