Abstract
As a foundation and typical task in natural language processing, text classification has been widely applied in many fields. However, as the basis of text classification, most existing corpus are imbalanced and often result in the classifier tending its performance to those categories with more texts. In this paper, we propose a background knowledge based multi-stream neural network to make up for the imbalance or insufficient information caused by the limitations of training corpus. The multi-stream network mainly consists of the basal stream, which retained original sequence information, and background knowledge based streams. Background knowledge is composed of keywords and co-occurred words which are extracted from external corpus. Background knowledge based streams are devoted to realizing supplemental information and reinforce basal stream. To better fuse the features extracted from different streams, early-fusion and two after-fusion strategies are employed. According to the results obtained from both Chinese corpus and English corpus, it is demonstrated that the proposed background knowledge based multi-stream neural network performs well in classification tasks.
Highlights
IntroductionTextual data are continuously increasing and have become one of the most commonly used information carriers [1]
In contemporary society, textual data are continuously increasing and have become one of the most commonly used information carriers [1]
To better incorporate background knowledge into feature selection and extraction, we proposed a multi-stream neural network with different fusion strategies, which mainly composed by basal stream and background knowledge based streams
Summary
Textual data are continuously increasing and have become one of the most commonly used information carriers [1]. As a kind of efficient information retrieval and data mining technology, text classification aims to get an association between the given document and one or more categories according to the features extracted. It has been widely used in many fields, such as sentiment analysis [2,3], stock analysis [4], news automatic grouping and so on. According to the results obtained from both Chinese corpus and English corpus, it is demonstrated that the proposed background knowledge based multi-stream neural network performs well in classification tasks. Chinese 9100 corpus: People’s daily contains documents, the news average number of texts in each classwhich iswhich
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.