Abstract

Short text is widely seen in applications including Internet of Things (IoT). The appropriate representation and classification of short text could be severely disrupted by the sparsity and shortness of short text. One important solution is to enrich short text representation by involving cognitive aspects of text, including semantic concept, knowledge, and category. In this paper, we propose a named Entity-based Concept Knowledge-Aware (ECKA) representation model which incorporates semantic information into short text representation. ECKA is a multi-level short text semantic representation model, which extracts the semantic features from the word, entity, concept and knowledge levels by CNN, respectively. Since word, entity, concept and knowledge entity in the same short text have different cognitive informativeness for short text classification, attention networks are formed to capture these category-related attentive representations from the multi-level textual features, respectively. The final multi-level semantic representations are formed by concatenating all of these individual-level representations, which are used for text classification. Experiments on three tasks demonstrate our method significantly outperforms the state-of-the-art methods.

Highlights

  • With the development of Internet of Things (IoT) [1], various information can be found online and IoT networks in the form of short text, such as short descriptions, social media, news description, product review, and instant messages, and so forth

  • To capture the category-related informative representation in terms of multi-level features, we build a joint model by using CNN-based Attention network to capture their respective attentive representations, and the embeddings learned from different aspects are concatenated for the short text representation

  • IoT networks involve increasing short text, which cannot be handled by document representation and classic NLP tools

Read more

Summary

Introduction

With the development of Internet of Things (IoT) [1], various information can be found online and IoT networks in the form of short text, such as short descriptions, social media, news description, product review, and instant messages, and so forth. Such methods gain more accurate short text representations, limitations exist such as on the way of combining extra knowledge bases, that is, they still suffer from making full use of external knowledge bases They consider only one aspect (only the entity or concept information) from knowledge bases to enrich the short text representation. To capture more semantic information, We use the named entity-based approach to obtain the external knowledge information—entity, concept, and knowledge graph Such external knowledge information is utilized to enrich the short text semantic representation. To capture the category-related informative representation in terms of multi-level features, we build a joint model by using CNN-based Attention network to capture their respective attentive representations, and the embeddings learned from different aspects are concatenated for the short text representation. The rest of this paper is organized as follows—Section 2 summarizes a brief review of the related work; Section 3 presents the details of the proposed method; Section 4 presents the experiments and analysis; lastly, Section 5 concludes the paper and outlines the future work

Related Work
The ECKA Method
Semantic Information Retrieval Module
Feature Extraction Module
The Input Layer
The Embedding Layer
The Representation Layer
The Attention Module
Experiments
Datasets
Data Preprocessing
Baselines
Parameter Setting
Result Analysis
Comparison of ECKA Variants on Multiple Sources
Parameter Sensitivity
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.