Abstract

Short text classification is a fundamental problem in natural language processing, social network analysis, and e-commerce. The lack of structure in short text sequences limits the success of popular NLP methods based on deep learning. Simpler methods that rely on bag-of-words representations tend to perform on par with complex deep learning methods. To tackle the limitations of textual features in short text, we propose a Graph-regularized Graph Convolution Network (GR-GCN), which augments graph convolution networks by incorporating label dependencies in the output space. Our model achieves state-of-the-art results on both proprietary and external datasets, outperforming several baseline methods by up to 6% . Furthermore, we show that compared to baseline methods, GR-GCN is more robust to noise in textual features.

Highlights

  • Short-text classification is a common problem in information retrieval (Ji et al, 2014) and has applications in several domains including e-commerce (Yu et al, 2012; Shen et al, 2009), social media (Kateb and Kalita, 2015), healthcare (Pestian et al, 2007) and cognitive-biometric recognition (Pokhriyal et al, 2016)

  • We develop a short text classification technique for solving two problems relevant to product search on e-commerce platform: 1) Product Query Classification (PQC) - When the customer enters a free form query, it is important to understand their product type intent to recommend and advertise the relevant products

  • Graph-regularized Graph Convolution Network (GR-GCN) can be seen to outperform all baseline models in classification accuracy by a margin of 6% for the Internal dataset, 2.8% on Electronics and 3.8% on Home dataset

Read more

Summary

Introduction

Short-text classification is a common problem in information retrieval (Ji et al, 2014) and has applications in several domains including e-commerce (Yu et al, 2012; Shen et al, 2009), social media (Kateb and Kalita, 2015), healthcare (Pestian et al, 2007) and cognitive-biometric recognition (Pokhriyal et al, 2016). Titles PhotoFast microSD to MS Pro Duo CR-5300, Kingston microSD Card and 8GB card for Blackberry Storm 9530 all belong to the same genre of microSD card products and need to be listed under the same category All these factors make it difficult to separate product-type classes by purely relying on text which is heterogeneous and contains noise. In output space, relationships between product-type classes can be modeled using product-category taxonomies, which are typically hand-curated and readily available in e-commerce applications. Such auxiliary information can be naturally represented in graphical form, where each node represents a short-text (input graph) or a class label (output graph), while an edge indicates magnitude of similarity between two nodes. We add noise in the input data and show that the graph’s presence makes our method more robust to noise as compared to baseline methods based on just textual features

Proposed Approach
Graph Construction
Experiments and Results
Quantitative Results
Impact of Incorporating Auxiliary Graphs
Robustness comparison
Effect of the size of the Labelled Data
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.