Abstract

This paper involves deriving high quality information from unstructured text data through the integration of rich document representations to improve machine learning text classification problems. Previous research has applied Neural Network Language Models (NNLMs) to document classification performance, and word vector representations have been used to measure semantics among text. Never have they been combined together and shown to have improved text classification performance. Our belief is that the inference and clustering abilities of word vectors coupled with the power of a neural network can create more accurate classification predictions. The first phase our work focused on word vector representations for classification purposes. This approach included analyzing two distinct text sources with pre-marked binary outcomes for classification, creating a benchmark metric, and comparing against word vector representations within the feature space as a classifier. The results showed promise, obtaining an area under the curve of 0.95 utilizing word vectors, relative to the benchmark case of 0.93. The second phase of the project focused on utilizing an extension of the neural network model used in phase one to represent a document in its entirety as opposed to being represented word by word. Preliminary results indicated a slight improvement over the baseline model of approximately 2–3 percent.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.