Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage

Abdur Rasool,Qingshan Jiang,Yang Wang,Qiang Qu

doi:10.3390/math10050845

Abstract

DNA has evolved as a cutting-edge medium for digital information storage due to its extremely high density and durable preservation to accommodate the data explosion. However, the strings of DNA are prone to errors during the hybridization process. In addition, DNA synthesis and sequences come with a cost that depends on the number of nucleotides present. An efficient model to store a large amount of data in a small number of nucleotides is essential, and it must control the hybridization errors among the base pairs. In this paper, a novel computational model is presented to design large DNA libraries of oligonucleotides. It is established by integrating a neural network (NN) with combinatorial biological constraints, including constant GC-content and satisfying Hamming distance and reverse-complement constraints. We develop a simple and efficient implementation of NNs to produce the optimal DNA codes, which opens the door to applying neural networks for DNA-based data storage. Further, the combinatorial bio-constraints are introduced to improve the lower bounds and to avoid the occurrence of errors in the DNA codes. Our goal is to compute large DNA codes in shorter sequences, which should avoid non-specific hybridization errors by satisfying the bio-constrained coding. The proposed model yields a significant improvement in the DNA library by explicitly constructing larger codes than the prior published codes.

Highlights

The exponential increase in big data demands high density and capacity storage.Inspired by nature, DNA has various applicable features for digital data storage
DNA data storage has three key steps [1–7]: (i) Digital data are converted into binary data, which are encoded into DNA strands with quaternary alphabet (A, C, T, and G) strings/sequences that are called DNA codes or codewords. (ii) These strands are synthesized into oligonucleotides by a DNA synthesizer, and the data are stored. (iii) DNA strands are decoded by DNA sequencing to retrieve the data
This paper introduces a more efficient coding technique with a novel computational model that is based on biologically inspired computing because it uses a neural network (NN) with biological constraints to obtain a high-density-based DNA data storage

Summary

Introduction

The exponential increase in big data demands high density and capacity storage. Inspired by nature, DNA (deoxyribonucleic acid) has various applicable features for digital data storage. In 2017, a study pioneered by Erlich [3] delivered a seminal work on DNA data storage by proposing a fountain code with GC-content (45–55%) and a minimum Hamming distance (d) They achieved 1.57 net information density; they still faced errors. [10] proposed a novel altruistic algorithm with lower bounds to generate constraint-based stable DNA codes It used constant GC-content and minimum Hamming distance and reported an improved number of DNA codewords. This paper introduces a more efficient coding technique with a novel computational model that is based on biologically inspired computing because it uses a neural network (NN) with biological constraints to obtain a high-density-based DNA data storage. The combinatorial bio-constraints, including GC-content, RC constraint, and Hamming distance, are constructed for optimal DNA codes to avoid non-specific hybridization by overcoming sequencing errors and secondary structures.

Deep Neural Networks for DNA Codes

DNA Coding with Combinatorial Bio-Constraints

Preliminaries and Notations

Proposed Model

NN-Based DNA Codes

Combinatorial Constraints

Result

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Mar 7, 2022
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

On conflict free DNA codes
Krishna Gopal Benerjee ... Sourav Deb
Cryptography and Communications | VOL. 13
Krishna Gopal Benerjee, et. al.Krishna Gopal Benerjee ... Sourav Deb
13 Oct 2020
Cryptography and Communications | VOL. 13

Long-Range Ordered Water Correlations between A–T/C–G Nucleotides
Zhonglong Luo ... Lei Jiang
Matter | VOL. 3
Zhonglong Luo, et. al.Zhonglong Luo ... Lei Jiang
28 Aug 2020
Matter | VOL. 3

DNA Code Design Based on the Bloch Quantum Chaos Algorithm
Qingji Guo ... Changjun Zhou
IEEE Access | VOL. 5
Qingji Guo, et. al.Qingji Guo ... Changjun Zhou
01 Jan 2017
IEEE Access | VOL. 5

Generating DNA Code Words Using Forbidding and Enforcing Systems
Daniela Genova ... Kalpana Mahalingam
-
Daniela Genova, et. al.Daniela Genova ... Kalpana Mahalingam
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics