Design and Development of Bioinformatics Feature Based DNA Sequence Data Compression Algorithm

Kakoli Banerjee,Vikram Bali

doi:10.4108/eai.13-7-2018.164097

Kakoli Banerjee, Vikram Bali

Open Access

PDF Available

https://doi.org/10.4108/eai.13-7-2018.164097

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

INTRODUCTION: Genetic data plays a key role in the healthcare area in specific, but they are typically very large in size. Many research shows that absence of DNA information at the right time is one of the major causes of error in the healthcare area. The more genomics information that analysts secure, the better the prospects for individual and general wellbeing. Persevering and retrieving genetic information in the right form within the given time is a big challenge in the field of Healthcare. Effectively, pre-birth DNA tests screen for formative variations from the norm. Before long, patients will have their blood sequenced to detect any nonhuman DNA that may flag an irresistible illness. Later on, somebody managing malignancy will most likely track the movement of the sickness by having the DNA and RNA of single cells from various tissues sequenced every day. DNA sequencing of whole population will give a complete and better prediction of population wellbeing.OBJECTIVES: Hereditary data is growing exponentially; hence it is hard to deal with the consistently developing hereditary database. The human genome in its base configuration occupies almost thirty terabyte of storage space. Computational assets are constrained. Not just storage, transmission abilities and run time memory is likewise constrained. Data Compression is a test when the hereditary information is exponentially expanding. It is critical to save the integrity of hereditary information while packing it. Hence the main objective of this paper is to develop a lossless DNA compression algorithm that not only gives better compression but also help in retrieval of Information for efficient use in the area of Healthcare.METHODS: In this paper a lossless hereditary data compression method is being proposed. The proposed calculation works in a horizontal mode and utilization a reference based substitution technique for compression. The principle thought of this paper is in the kind of similarity scanned. All the predominant hereditary Compression methods search for similarity within the chromosome. These calculations either pursue flat mode or vertical mode for accomplishing compression. But whichever method the existing genetic compression algorithms use, they are all based on searching similarities within the chromosome i.e. they exploit only inter chromosomal similarities. The current studies focus will show that compression ratio achieved by analyzing individual chromosome is always less than the method in which we analyze and compress intra chromosomal similarities.RESULTS: This study shows that by simply using exactly matching repeats amongst all the chromosomes of the same genome, not only the compression ratio is improving but also a detailed study of all the similarities and differences between two genomes of the same species can be conducted.CONCLUSION: In this study, a new compression algorithm is being proposed for compressing DNA. Along with Inter chromosomal similarities, Intra chromosomal similarities are considered for this method. The results clearly shows that intra chromosomal matches are bigger and more than inter chromosomal matches which helps us to achieve better compression ratio.

Highlights

Genetic data plays a key role in the healthcare area in specific, but they are typically very large in size
Need of Compression in Health Care Industry As the Health care industry is shifting from the classical methods to more prediction based methods [16,17,18,19], the need of storing biological information efficiently is becoming a more critical issue
A new compression algorithm is being proposed for solving DNA sequence compression problem

Summary

Introduction

Genetic data plays a key role in the healthcare area in specific, but they are typically very large in size. Many research shows that absence of DNA information at the right time is one of the major causes of error in the healthcare area. Data Compression is a test when the hereditary information is exponentially expanding. The main objective of this paper is to develop a lossless DNA compression algorithm that gives better compression and help in retrieval of Information for efficient use in the area of Healthcare. All the predominant hereditary Compression methods search for similarity within the chromosome. Whichever method the existing genetic compression algorithms use, they are all based on searching similarities within the chromosome i.e. they exploit only inter chromosomal similarities. The results clearly shows that intra chromosomal matches are bigger and more than inter chromosomal matches which helps us to achieve better compression ratio. There is requirement of such type of Genetic Data Compression Algorithm, which can yield best compression ratio and retrieve the data in minimal time

Objectives

Methods

Conclusion