Abstract
This paper studies the fundamental limits of storage for structured data, where statistics and structure are both critical to the application. Accordingly, a framework is proposed for optimal lossless and lossy compression of subsets of the possible realizations of a discrete memoryless source (DMS). For the lossless subset-compression problem, it turns out that the optimal source code may not index the conventional source-typical sequences, but rather index certain subset-typical sequences consistent with the subset of interest. Building upon an achievability and a strong converse, an analytic expression is given, based on the Shannon entropy, relative entropy, and subset entropy, which identifies such subset-typical sequences for a broad class of subsets of a DMS. Intuitively, subset-typical sequences belong to those typical sets which highly intersect the subset of interest but are still closest to the source distribution in the sense of relative entropy. For the lossy subset-compression problem, an upper bound is derived on the subset rate-distortion function in terms of the subset mutual information optimized over the set of conditional distributions that satisfy the expected distortion constraint with respect to the subset-typical distribution and over a set of certain auxiliary subsets. By proving a strong converse result, this upper bound is shown to be tight for a class of symmetric subsets. As shown in our numerical examples, more often than not, one achieves a gain in the fundamental limits, in that the optimal compression rate for the subset in both the lossless and lossy settings can be strictly smaller than the source entropy and the source rate-distortion function, respectively, although exceptions are also possible.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.