Part-of-Speech Tags and ICE Text Classification

Alex Chengyu Fang,Jing Cao

doi:10.1007/978-3-662-45100-7_5

Abstract

Part-of-speech (POS) tags have been employed in automatic genre classification in that they do not ‘reflect the topic of the document, but rather the type of text used in the document’ and that their distribution has been observed to vary across different genres. The current study introduces a new set of linguistically fine-grained POS tags generated by AUTASYS for automatic genre classification. The experiment was designed to investigate the impact of the proposed feature set when compared and contrasted with word unigrams as a bag of words (BOW) and an impoverished POS tag set. Machine-learning tools were used to evaluate the classification performance in terms of F-score. The British component of the International Corpus of English was employed as a resource of different text genres. Ten different genre classification tasks were identified based on the existing British component of the International Corpus of English (ICE-GB) categories, which are grouped according to different granularities. As our results will show, the use of linguistically rich POS tags as discriminative features produces superior accuracy when compared with BOW for fine-grained genre classification. Our results will further demonstrate that the superior performance is due to the rich linguistic information since an impoverished tag set yielded worse classification results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Part-of-Speech Tags and ICE Text Classification

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Automatic Genre Classification by Using Co-training
Rui Liu ... Zheng Tie
-
Rui Liu, et. al.Rui Liu ... Zheng Tie
01 Jan 2009
01 Jan 2009

Semi-supervised Graph-based Genre Classification for Web Pages
Noushin Rezapour Asheghi ... Katja Markert
-
Noushin Rezapour Asheghi, et. al.Noushin Rezapour Asheghi ... Katja Markert
01 Jan 2014
01 Jan 2014

Combination of Genetic Algorithm and Brill Tagger Algorithm for Part of Speech Tagging Bahasa Madura
Nindian Puspa Dewi ... Ubaidi Ubaidi
Proceeding of the Electrical Engineering Computer Science and Informatics | VOL. 7
Nindian Puspa Dewi, et. al.Nindian Puspa Dewi ... Ubaidi Ubaidi
01 Oct 2020
Proceeding of the Electrical Engineering Computer Science and Informatics | VOL. 7

Part of speech tagging for Arabic
Sandra Kübler ... Emad Mohamed
Natural Language Engineering | VOL. 18
Sandra Kübler, et. al.Sandra Kübler ... Emad Mohamed
06 Dec 2011
Natural Language Engineering | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Part-of-Speech Tags and ICE Text Classification

Abstract

Talk to us

Similar Papers