Abstract

Due to the development of high-throughput DNA sequencing technology, genome-sequencing costs have been significantly reduced, which has led to a number of revolutionary advances in the genetics industry. However, the problem is that compared to the decrease in time and cost needed for DNA sequencing, the management of such large volumes of data is still an issue. Therefore, this research proposes Blockchain Applied FASTQ and FASTA Lossless Compression (BAQALC), a lossless compression algorithm that allows for the efficient transmission and storage of the immense amounts of DNA sequence data that are being generated by Next Generation Sequencing (NGS). Also, security and reliability issues exist in public sequence databases. For methods, compression ratio comparisons were determined for genetic biomarkers corresponding to the five diseases with the highest mortality rates according to the World Health Organization. The results showed an average compression ratio of approximately 12 for all the genetic datasets used. BAQALC performed especially well for lung cancer genetic markers, with a compression ratio of 17.02. BAQALC performed not only comparatively higher than widely used compression algorithms, but also higher than algorithms described in previously published research. The proposed solution is envisioned to contribute to providing an efficient and secure transmission and storage platform for next-generation medical informatics based on smart devices for both researchers and healthcare users.

Highlights

  • Due to the development of high throughput DNA sequencing technology, genome sequencing costs have been significantly reduced, which has led to a number of revolutionary advances in the genetics industry [1]

  • We use formats collected from the NCBI Sequence Read Archive (SRA) [40]

  • An overall trend of DNA compression solutions can be inferred from this research

Read more

Summary

Introduction

Due to the development of high throughput DNA sequencing technology, genome sequencing costs have been significantly reduced, which has led to a number of revolutionary advances in the genetics industry [1]. Management refers to the transmission and storage of the sequence data. Security and reliability issues remain obstacles in public sequence databases [5], which is even more discouraging to researchers This is expected to be a problem for next-generation medical informatics, such as personal health record (PHR) systems [6], where healthcare consumers own their entire health data, and in. DNA sequence data is not an exception for future consideration It has already been emphasized prior research [8,9,10] that protective approaches are required to defend against genome data attacks. Efficient sequencing compression methods areCalifornia in high demand, nota only and by industry, considering the data fact that. E-Health [17], or m-Health

DNA Data Composition
Data Compression
Prior Research on DNA Compression
Blockchain Technology
Overall Architecture
Overall
Proposed Solution
Materials and Methods
Comparison between Datasets
Compression Ratio Comparison to Widely Used Algorithms
Compression Ratio Comparison to Related Research
Overall CR and Stability Comparison
Discussion and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call