Preserving quality of information by using semantic relationships

P Basu,J Bao,M Dean,J Hendler

doi:10.1016/j.pmcj.2013.07.013

Abstract

In pervasive computing and sensing applications, a multitude of devices such as sensors and processors (that perform fusion) serve as rich sources of data and information over long periods of time. It is often the case that the information streams generated inside an application are not independent of each other; instead, they have certain semantic relationships between them. In order to deal with high volumes of information generated over time, it is sometimes necessary to compress these information streams. However, it is often the case that the underlying meaning or semantics of the information is what is critical for maintaining an acceptable level of information quality, rather than the actual data in its entirety. In this paper, we show how semantic redundancy and ambiguity within a semantically-aware source can be exploited to achieve compression with a goal of being able to recover the meaning underlying its messages. We take the preliminary steps to extend the source coding principles of classical information theory and show that by utilizing semantic inference relations between probabilistically expressed messages and underlying models at the source, a higher rate of compression, albeit lossy, may be achieved compared to traditional syntactic compression methods. We define a “semantic entropy” measure for a source and show that it is bounded from above by the mutual information between its models and the syntactic messages it generates. We also consider some simple graph based semantic inference relationships derived from propositional logic and give practical algorithms that exploit the graph structure of a shared knowledge base to facilitate lossless semantic compression.

Full Text