Abstract

Data Compression has been one of the enabling technologies for the on-going digital multimedia revolution for decades which resulted in renowned algorithms like Huffman Encoding, LZ77, Gzip, RLE and JPEG etc. Researchers have looked into the character/word based approaches to Text and Image Compression missing out the larger aspect of pattern mining from large databases. The central theme of our compression research focuses on the Compression perspective of Data Mining as suggested by Naren Ramakrishnan et al. wherein efficient versions of seminal algorithms of Text/Image compression are developed using various Frequent Pattern Mining(FPM)/Clustering techniques. This paper proposes a cluster of novel and hybrid efficient text and image compression algorithms employing efficient data structures like Hash and Graphs. We have retrieved optimal set of patterns through pruning which is efficient in terms of database scan/storage space by reducing the code table size. Moreover, a detailed analysis of time and space complexity is performed for some of our approaches and various text structures are proposed. Simulation results over various spare/dense benchmark text corpora indicate 18% to 751% improvement in compression ratio over other state of the art techniques. In Image compression, our results showed up to 45% improvement in compression ratio and up to 40% in image quality efficiency.

Highlights

  • Frequent Pattern Mining(FPM) is a non-trivial phase of Association Rule Mining(ARM) and is formally defined as follows: Let I = {i1, i2, i3, ..., in} be a set of items, and a transaction database TD = 〈T1, T2, T3, ..., Tm〉, where Ti(i ∈ [1...m]) is a transaction containing a set of items in I

  • 6 Simulation Results and Discussion of the Proposed Text and Image Compression algorithms i) Performance of Cr and Time of Text Compression Algorithms Simulation is performed on an Intel core i5-3230M CPU 2.26 GHz with 4GB Main Memory and 500GB Hard disk on Ubuntu 13.04 OS Platform

  • The introduction of fmod leads to the optimal set of frequent patterns which reduces the code table size in FPH-HB whereas in GA78, because of the graph approach which directly mines the required patterns, the code table size is reduced

Read more

Summary

Introduction

Oswald and Sivaselvan: Text and Image Compression based on Data Mining Perspective. Huffman Encoding, a seminal algorithm for lossless data compression, developed by David A. Involves assigning variable code length to characters based on their probability of occurences (Huffman, 1952) In this method, lossless decoding is done using the property that the codes generated are prefix codes. Lossy Compression is a class of data encoding method that uses inexact approximations It finds extensive applications in the transfer and storage of multimedia data like Image, Audio and Video where slight data loss might not be noticed by the end user. This work focuses on efficient Data Mining approaches to Text and Image compression. The process of Data Mining focuses on generating a reduced(smaller) set of patterns(knowledge) from the original database, which can be viewed as a compression technique.

Related Work
Preliminaries and Problem Definition
Time Complexity Analysis of CBH
Space Complexity Analysis of CBH
Characterization of Text Structures
Time and Space Complexity of various text structures for FPH approach
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.