Text and Image Compression based on Data Mining Perspective

C Oswald,B Sivaselvan

doi:10.5334/dsj-2018-012

Abstract

Data Compression has been one of the enabling technologies for the on-going digital multimedia revolution for decades which resulted in renowned algorithms like Huffman Encoding, LZ77, Gzip, RLE and JPEG etc. Researchers have looked into the character/word based approaches to Text and Image Compression missing out the larger aspect of pattern mining from large databases. The central theme of our compression research focuses on the Compression perspective of Data Mining as suggested by Naren Ramakrishnan et al. wherein efficient versions of seminal algorithms of Text/Image compression are developed using various Frequent Pattern Mining(FPM)/Clustering techniques. This paper proposes a cluster of novel and hybrid efficient text and image compression algorithms employing efficient data structures like Hash and Graphs. We have retrieved optimal set of patterns through pruning which is efficient in terms of database scan/storage space by reducing the code table size. Moreover, a detailed analysis of time and space complexity is performed for some of our approaches and various text structures are proposed. Simulation results over various spare/dense benchmark text corpora indicate 18% to 751% improvement in compression ratio over other state of the art techniques. In Image compression, our results showed up to 45% improvement in compression ratio and up to 40% in image quality efficiency.

Highlights

Frequent Pattern Mining(FPM) is a non-trivial phase of Association Rule Mining(ARM) and is formally defined as follows: Let I = {i1, i2, i3, ..., in} be a set of items, and a transaction database TD = 〈T1, T2, T3, ..., Tm〉, where Ti(i ∈ [1...m]) is a transaction containing a set of items in I
6 Simulation Results and Discussion of the Proposed Text and Image Compression algorithms i) Performance of Cr and Time of Text Compression Algorithms Simulation is performed on an Intel core i5-3230M CPU 2.26 GHz with 4GB Main Memory and 500GB Hard disk on Ubuntu 13.04 OS Platform
The introduction of fmod leads to the optimal set of frequent patterns which reduces the code table size in FPH-HB whereas in GA78, because of the graph approach which directly mines the required patterns, the code table size is reduced

Summary

Introduction

Oswald and Sivaselvan: Text and Image Compression based on Data Mining Perspective. Huffman Encoding, a seminal algorithm for lossless data compression, developed by David A. Involves assigning variable code length to characters based on their probability of occurences (Huffman, 1952) In this method, lossless decoding is done using the property that the codes generated are prefix codes. Lossy Compression is a class of data encoding method that uses inexact approximations It finds extensive applications in the transfer and storage of multimedia data like Image, Audio and Video where slight data loss might not be noticed by the end user. This work focuses on efficient Data Mining approaches to Text and Image compression. The process of Data Mining focuses on generating a reduced(smaller) set of patterns(knowledge) from the original database, which can be viewed as a compression technique.

Related Work

Preliminaries and Problem Definition

Time Complexity Analysis of CBH

Space Complexity Analysis of CBH

Characterization of Text Structures

Time and Space Complexity of various text structures for FPH approach

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Science Journal	Publication Date: Jun 7, 2018
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Text and Image Compression based on Data Mining Perspective

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal

Lead the way for us

Similar Papers

Comparison of Huffman Algorithm and Lempel-Ziv Algorithm for audio, image and text compression
Rhen Anjerome Bedruz ... Ana Riza F Quiros
-
Rhen Anjerome Bedruz, et. al.Rhen Anjerome Bedruz ... Ana Riza F Quiros
01 Dec 2015
01 Dec 2015

Quantization Table Selection Using Firefly with Teaching and Learning Based Optimization Algorithm for Image Compression
D Preethi ... D Loganathan
-
D Preethi, et. al.D Preethi ... D Loganathan
01 Jan 2019
01 Jan 2019

Efficient image compression and decompression algorithms for OCR systems
Boban Arizanovic ... Vladan Vuckovic
Facta universitatis - series: Electronics and Energetics | VOL. 31
Boban Arizanovic, et. al.Boban Arizanovic ... Vladan Vuckovic
01 Jan 2018
Facta universitatis - series: Electronics and Energetics | VOL. 31

Data compression using text encryption
H Kruse ... A Mukherjee
-
H Kruse, et. al.H Kruse ... A Mukherjee
01 Jan 1997
01 Jan 1997

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text and Image Compression based on Data Mining Perspective

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science Journal