Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition

Xia Cui,Danushka Bollegala,Naoki Masuda,Sadamori Kojaku

doi:10.18653/v1/s18-2030

Xia Cui, Danushka Bollegala + Show 2 more

Open Access

https://doi.org/10.18653/v1/s18-2030

Copy DOI

Publication Date: Jan 1, 2018
Citations: 31	License type: cc-by

Affiliation: University of Liverpool, University of Bristol

Abstract

Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.

Highlights

Short-texts are abundant on the Web and appear in various different formats such as microblogs (Kwak et al, 2010), Question and Answer (QA) forums, review sites, Short Message Service (SMS), email, and chat messages (Cong et al, 2008; Thelwall et al, 2010)
To address the feature sparseness problem encountered in short-text and cross-domain classification tasks, we propose a novel method that computes related features that can be appended to the feature vectors to reduce the sparsity
We evaluate the effectiveness of the proposed method using benchmark datasets for two different tasks: short-text classification and crossdomain sentiment classification

Summary

Introduction

Short-texts are abundant on the Web and appear in various different formats such as microblogs (Kwak et al, 2010), Question and Answer (QA) forums, review sites, Short Message Service (SMS), email, and chat messages (Cong et al, 2008; Thelwall et al, 2010). Frequency of a feature in a short-text will be small, which makes it difficult to reliably estimate the salience of a feature using term frequency-based methods This is known as the feature sparseness problem in text classification. To address the feature sparseness problem encountered in short-text and cross-domain classification tasks, we propose a novel method that computes related features that can be appended to the feature vectors to reduce the sparsity. Prior work on pivot-based crossdomain sentiment classification methods have used features that are frequent in training (source) and test (target) data as expansion candidates to overcome the feature mismatch problem. In cross-domain sentiment classification experiments, the proposed method outperforms previously proposed pivot-based methods such as the structural correspondence learning (SCL) (Blitzer et al, 2006)

Related Work

CP-decomposition-based Feature Expansion

Core-Periphery Decomposition

Feature-Relatedness Graph

Feature Expansion

Experiments

Methods

Classification Accuracy

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

ClassiNet -- Predicting Missing Features for Short-Text Classification
Danushka Bollegala ... Ken-Ichi Kawarabayashi
ACM Transactions on Knowledge Discovery from Data | VOL. 12
Danushka Bollegala, et. al.Danushka Bollegala ... Ken-Ichi Kawarabayashi
27 Jun 2018
ACM Transactions on Knowledge Discovery from Data | VOL. 12

Cross-Domain Sentiment Classification with Attention-Assisted GAN
Yi-Fan Li ... Yang Gao
-
Yi-Fan Li, et. al.Yi-Fan Li ... Yang Gao
01 Dec 2021
01 Dec 2021

Concept and dependencies enhanced graph convolutional networks for short text classification
Zhang Hu ... Bai Ping
Journal of Intelligent & Fuzzy Systems | VOL. -
Zhang Hu, et. al.Zhang Hu ... Bai Ping
21 Sep 2023
Journal of Intelligent & Fuzzy Systems | VOL. -

Short Text Classification with A Convolutional Neural Networks Based Method
Yibo Hu ... Quan Pan
-
Yibo Hu, et. al.Yibo Hu ... Quan Pan
01 Nov 2018
01 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition

Abstract

Highlights

Summary

Talk to us

Similar Papers