Abstract

Compression of short text strings, such as the GSM Short Message Service (SMS) and Twitter messages, has received relatively little attention compared to the compression of longer texts. This is not surprising given that for typical cellular and internet-based networks, the cost of compression probably outweighs the cost of delivering uncompressed messages. However, this is not necessarily true in the case where the cost of data transport is high, for example, where satellite back-haul is involved, or on bandwidth-starved mobile mesh networks, such as the mesh networks for disaster relief, rural, remote and developing contexts envisaged by the Serval Project [1-4]. This motivated the development of a state-of-art text compression algorithm that could be used to compress mesh-based short-message traffic, culminating in the development of the stats3 SMS compression scheme described in this paper. Stats3 uses word frequency and 3rd-order letter statistics embodied in a pre-constructed dictionary to affect lossless compression of short text messages. This scheme shows that our scheme compressing text messages typically reduces messages to less than half of their original size, and in so doing substantially outperforms all public SMS compression systems, while also matching or exceeding the marketing claims of the commercial options known to the authors. We also outline approaches for future work that has the potential to further improve the performance and practical utility of stats3.

Highlights

  • Loss-less compression is a mature field, with a wide variety of methodologies and implementations

  • The GSM Association (GSMA) did create a standard for Short Message Service (SMS) compression, GSM 03.42 [5], but it is difficult to ascertain whether it has been widely adopted by carriers, and the standard itself has not been updated since 1999 apart from being carried forward into the corresponding LTE standard [6], even though attractive compression technologies such as Arithmetic Coding [712] have lapsed from patent encumberance in the meantime

  • Perhaps for mobile telecommunications carriers, the disinterest is commercial; they charge per message unit of 160 7-bit characters, i.e., 140 8-bit bytes, and reducing the number of messages required for parties to communicate would be revenue-negative

Read more

Summary

Introduction

Loss-less compression is a mature field, with a wide variety of methodologies and implementations. The impressive cost charged per message unit by mobile telecommunications carriers, introduces the potential for SMS compression to be revenue-positive. The Serval Project [2,3,4] is a mobile ad-hoc network that provides resilient communications for rural, remote and disaster situations In such networks, bandwidth is often limited, and clusters of nodes may be isolated from one another, and satellite SMS services may be the backhaul of last resort. The remainder of this paper briefly explores: technical challenges to compressing short messages; a survey of existing compression schemes appropriate for use on SMS-length messages; an introduction to our new stats SMS compression scheme, including a comparison with the existing state-of-the-art, showing that our scheme substantially outperforms all public SMS compression systems, and matches or outperforms the unverified marketing claims of the commercial offerings. We outline several areas for future work that we believe, which have the potential to further improve the performance of stats

Challenges of Compressing of Short
Existing Short Message Compression Schemes
ShortBWT
Commercial Short Message Compression Systems
Summary
Overview of the Stats3 Short Message Compression Scheme
Dictionary Generation
Model Selection
Results & Conclusions
Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call