Abstract

As a powerful tool for storing digital information in chemically synthesized molecules, DNA-based data storage has undergone continuous development and received increasingly more attention. Efficiently recovering information from large-scale DNA strands that suffer from insertions, deletions, and substitution errors (collectively referred to as edit errors), is one of the major bottlenecks in DNA-based storage systems. To cope with this challenge, in this paper, we provide a segmented-edit error-correcting code with the re-synchronization function, termed the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DNA-LM</i> code. Compared with the previous segmented-error-correcting codes, it has a systematic structure and does not require the endpoint of the received segment as pre-requisite information for decoding. In the case that the number of edit errors exceeds the edit error-correcting capability of a segment, it can easily regain synchronization to ensure that the subsequent decoding continues. Both encoding and decoding complexity is linear in the codeword length. The redundancy of each segment is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\lceil \log k\rceil +6$</tex-math></inline-formula> quaternary symbols, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> is the length of the message segment. We further generalize the decoding algorithm to deal with duplicated DNA strands, whereas it still maintains linear time complexity in the codeword length and the number of duplications. Simulations under a stochastic edit errors model show that, at a low raw error rate of the “next-gen” sequencing, our code can enable error-free decoding by concatenating with the (255,223) RS code.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call