Abstract

With the rapid advent of modem science and technology, information retrieval and data field focusing on automatic long text classification becomes a hot topic. However, text classification learning based on machine learning needs to be studied and evaluated comprehensively from algorithm, model, evaluation, and other aspects. In this paper, the deep learning method is used for text classification. Firstly, the conventional process of text classification is described, including data pre-processing, text segmentation, feature construction, classification algorithm overview and so on. Secondly, CNN with multiple convolution kernels is applied to text classification. The traditional bag of words model is replaced by the word vector model, and multi-dimensional convolution kernel is used to extract text features. The experiment in this paper demonstrated the promising performance of CNN models in classifying text document, yet posed the following directions that pending more intensive research: 1) How to better compare the proposed models and their performance. What is the most suitable classification tasks for each model in the form of text? 2) How to understand the data better. Particularly, what did the deep learning model learn? 3) Hyper-parameter tuning. The hyper-parameter tuning is a task that builds on experience. 4) Label imbalance: when the data samples are unbalanced in terms of label, how to adjust the sample weight in loss similar to the bootstrap method. 5) What is the smallest neural network structure that can achieve a certain accuracy on a given data set.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.