Abstract

Word Embedding is the primary task in text mining, a subfield of Natural Language Processing. Feature Engineering also referred as feature extraction deals with building features out of existing data. This process involves constructing right features from the dataset and feeding relevant and best features into the Machine learning model for training. Feature engineering leads to greater improvements on many NLP tasks. The various aspects of feature engineering includes feature selection, feature addition, feature filtering and feature scaling. The Skip-gram architecture works better than other feature extraction models like CBOW, TD-IDF etc. This unsupervised learning technique used to find context word for a given target word. The skip-gram word2vec model outclasses the shortcomings with word-embedding matrix like loss of information from former cell, noisy information and over-fitting problems. The Skip gram algorithm trains the neural network and each time the output will be a softmax vector and it computes the cross entropy loss between output vector and true one-hot encoder. Skip-gram word2vec model shows good representation over finding out rare words or phrases. To improve the computational speed and accuracy, the parameters of skip gram model are fine tuned. Experimental results proves the efficiency of the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call