Abstract

The Internet has emerged as a pivotal medium for human interaction, leading to profound transformations in language dynamics, especially in computer-mediated communication (CMC) platforms. These transformations are reflected in changing orthography, graphology, vocabulary, grammar, syntax, pragmatics, and the style of natural languages. Internet-based communication has become more accessible. To increase accessibility, users of Internet-based communication platforms speak in the language that comes more naturally to them. Internet users use new terminologies and phrases to make their messages more engaging. Internet users all over the world are following this trend. Internese is the name of this emerging trend in internet lingo. All the internese features are vividly marked in Malayalam social media comments. Typographical errors are common in Malayalam social media comments. None of the spell checkers currently available in Malayalam recognize words and phrases in the social media comments scripted in Malayalam. This study proposes two different models of automatic typo detectors and typo correctors for Malayalam social media comments. The first model of Malayalam typo detectors and typo correctors is a sequence-2-sequence deep learning model, and the other is a hybrid model. These two models analyzed the input words at their phonogram level. The proposed hybrid system has achieved acceptable performance with the recently built corpus. The sequence-2-sequence model performs rather well. Because of Malayalam's intricate structure, the model's significant reliance on data influences its accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call