Abstract

Short Message Service (SMS) via cell phones is a widely used mode of data communication. Currently employed encoding schemes allow the transmission of 160 characters per SMS in English. This drops to 70 characters per SMS if any Indian language including Hindi is used, due to the UNICODE format used therein. Schemes proposed to improve the encoding efficiency of short text messaging generally encode one character at a time. Table splitting schemes that reduce the average number of bits per character are generally used in this context. In this paper, a novel multi-character frequency-based encoding scheme is proposed for efficient messaging of short text messages in four Indian Languages. Both uni-gram and bi-gram modelling based schemes are proposed herein. The efficiency of the proposed schemes is evaluated by conducting experiments on a large multilingual database of short text messages collected from twitter using a dictionary learning approach. Performance evaluation shows that these encoding schemes can allow the transmission of around 190 characters per SMS in English and more than 165 characters per SMS for Four Indian Languages. Encoding efficiency is significantly improved when compared to existing state of the art table marker algorithms and is motivating enough to be used in practice for transmission of short text messages in Indian Languages.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.