Abstract

Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.

Highlights

  • Short-texts are abundant on the Web and appear in various different formats such as microblogs (Kwak et al, 2010), Question and Answer (QA) forums, review sites, Short Message Service (SMS), email, and chat messages (Cong et al, 2008; Thelwall et al, 2010)

  • To address the feature sparseness problem encountered in short-text and cross-domain classification tasks, we propose a novel method that computes related features that can be appended to the feature vectors to reduce the sparsity

  • We evaluate the effectiveness of the proposed method using benchmark datasets for two different tasks: short-text classification and crossdomain sentiment classification

Read more

Summary

Introduction

Short-texts are abundant on the Web and appear in various different formats such as microblogs (Kwak et al, 2010), Question and Answer (QA) forums, review sites, Short Message Service (SMS), email, and chat messages (Cong et al, 2008; Thelwall et al, 2010). Frequency of a feature in a short-text will be small, which makes it difficult to reliably estimate the salience of a feature using term frequency-based methods This is known as the feature sparseness problem in text classification. To address the feature sparseness problem encountered in short-text and cross-domain classification tasks, we propose a novel method that computes related features that can be appended to the feature vectors to reduce the sparsity. Prior work on pivot-based crossdomain sentiment classification methods have used features that are frequent in training (source) and test (target) data as expansion candidates to overcome the feature mismatch problem. In cross-domain sentiment classification experiments, the proposed method outperforms previously proposed pivot-based methods such as the structural correspondence learning (SCL) (Blitzer et al, 2006)

Related Work
CP-decomposition-based Feature Expansion
Core-Periphery Decomposition
Feature-Relatedness Graph
Feature Expansion
Experiments
Methods
Classification Accuracy
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.