Abstract
Short text is widely seen in applications including Internet of Things (IoT). The appropriate representation and classification of short text could be severely disrupted by the sparsity and shortness of short text. One important solution is to enrich short text representation by involving cognitive aspects of text, including semantic concept, knowledge, and category. In this paper, we propose a named Entity-based Concept Knowledge-Aware (ECKA) representation model which incorporates semantic information into short text representation. ECKA is a multi-level short text semantic representation model, which extracts the semantic features from the word, entity, concept and knowledge levels by CNN, respectively. Since word, entity, concept and knowledge entity in the same short text have different cognitive informativeness for short text classification, attention networks are formed to capture these category-related attentive representations from the multi-level textual features, respectively. The final multi-level semantic representations are formed by concatenating all of these individual-level representations, which are used for text classification. Experiments on three tasks demonstrate our method significantly outperforms the state-of-the-art methods.
Highlights
With the development of Internet of Things (IoT) [1], various information can be found online and IoT networks in the form of short text, such as short descriptions, social media, news description, product review, and instant messages, and so forth
To capture the category-related informative representation in terms of multi-level features, we build a joint model by using CNN-based Attention network to capture their respective attentive representations, and the embeddings learned from different aspects are concatenated for the short text representation
IoT networks involve increasing short text, which cannot be handled by document representation and classic NLP tools
Summary
With the development of Internet of Things (IoT) [1], various information can be found online and IoT networks in the form of short text, such as short descriptions, social media, news description, product review, and instant messages, and so forth. Such methods gain more accurate short text representations, limitations exist such as on the way of combining extra knowledge bases, that is, they still suffer from making full use of external knowledge bases They consider only one aspect (only the entity or concept information) from knowledge bases to enrich the short text representation. To capture more semantic information, We use the named entity-based approach to obtain the external knowledge information—entity, concept, and knowledge graph Such external knowledge information is utilized to enrich the short text semantic representation. To capture the category-related informative representation in terms of multi-level features, we build a joint model by using CNN-based Attention network to capture their respective attentive representations, and the embeddings learned from different aspects are concatenated for the short text representation. The rest of this paper is organized as follows—Section 2 summarizes a brief review of the related work; Section 3 presents the details of the proposed method; Section 4 presents the experiments and analysis; lastly, Section 5 concludes the paper and outlines the future work
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.