Abstract
As social media networks have grown in prominence in recent years, we have seen a transformation in how we live our lives. People in multilingual societies are increasingly using social media platforms. Research communities have recently begun using code-mixed data to accomplish NLP tasks involving multiple languages. This paper analyzes text representation by the code-mixed and code-switching of two different languages of separate scripts like English Roman script and Hindi Devanagari through the mixed script. Word-level language Identification. Quantifying the pattern of sentence/Text written in which language and which word is ambiguous same spelling two to three meanings in a mixed script, and spelling variation normalization of Hindi Roman one has the different spelling used for daily communications by users using word embedding techniques (word2vec, TF-IDF, skip-gram, Continuous bag of words(CBOW)). An approach that uses character-based embedding to process ambiguous words in a code-mixed text has been proposed and shows promising results in terms of spelling variation and language identification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.