Abstract

This paper provides the method and process to build machine learning system using Deep Neural Network (DNN) for lexicon analysis of text. Parts of Speech (POS) tagging of word is important in Natural language processing either it is speech technology or machine translation. The recent advancement of Deep Neural Network would help us to achieve better result in POS tagging of words and phrases. Word2vec tool of Dl4j library is very popular to represent the words in continuous vector space and these vectors capture the syntactic and semantic meaning of corresponding words. If we have a database of sample words with their POS category, it is possible to assign POS tag to the words but it fails when the word is not present in database. Cosine similarity concept plays an important role to find the POS Tags of the words and phrases which are not previously trained or POS Tagged. With the help of Cosine similarity, system assign the appropriate POS tags to the words by finding their nearest similar words using the vectors which we have trained from Word2vec database. Deep neural network like RNN outperforms as compare to traditional state of the art as it deals with the issue of word sense disambiguation. Semi-supervised learning is used to train the network. This approach can be applicable for Indian languages as well as for foreign languages. In this paper, RNN is implemented to build a machine learning system for POS-tagging of the words in English language sentences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call