Abstract
It is important to classify documents according to their contents because of finding necessary documents efficiently. To achieve good classification document similarity estimation is one of key techniques since classification is executed based on the document similarity. In natural language processing bag-of-words model is used to extract features from documents and term occurrence frequency based value is used as a weight of each features. However, the term weight methodologies usually use predefined models and include some limitations. New approaches to construct feature vectors based on data distribution are desired to achieve high performance of natural language processing. These days many researchers pay attention to deep learning. Deep learning is a new approach to transform raw data to feature vectors using many unlabeled data. This characteristics is desirable to satisfy a previous need. In natural language processing a main aim is to construct a language model on a deep architecture neural network. In this paper we use a deep architecture neural network to estimate document similarity. To obtain good article similarity estimation we have to generate good article vectors that can represent all article characteristics. Hence, we use many stock market news to train the deep architecture neural network and generate article vectors with the trained neural network. And we calculate cosine similarity between labeled articles and discuss performance of the deep architecture neural network. In evaluation we do not focus on articles' contents but on their sentiment polarity. Hence, we discuss whether the proposed method classifies articles according to their sentiment polarity or not. We confirmed though the proposed method is an unsupervised learning approach, it achieves good performance in stock market news similarity estimation. The results show a deep architecture neural network can be applied to more natural language processing tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.