Abstract

Abstract: NLP is natural language processing or neuro linguis-tic programming.Natural languages like malayalam are highly inflectional and agglutinative in nature.This is problematic whendealing with nlp based malayalam applications.So that inorder toimprove performance of malayalam nlp based applications, wordembedding improvement on malayalam corpus is needed.The improvement is based on converting the words contained inthe malayalam corpus into a standardised means removingall inflectional parts in the words in the existing malayalamcorpus ie taking root words only.All that needed is a stemmer.Inthis project i have used a malayalam morphological analyserfor taking root words of all words in the existing malayalam corpus.The advanatge of removing inflectional parts from allwords is that we can reduce the sparsity in the existing malayalam corpus.Also there will be a high hike in frequency of wordsin the resulting corpus,then the space and time complexity of wordembedding representation of the existing corpus willdecreases.According to zipfs law by increasing frequency ofwords performance of neural word embedding will increases. Zipfs Law is a discrete probability distribution that tells you the probability of encountering a word in a given corpus.By applying zipfs law am proposing there will be improvement on malyalam wordembedding.Here using fasttext, word embeddingsare performed and capture dense word vector representation ofthe malayalam corpus with dimensionality reduction from thesparse word co-occurence matrix.The improvement is mainly used for wordnet,analogy,ontology based malayalam applications.Index Terms—Morphological Analyzer,Zipfs law, Preprocessing, Testing, Training

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.