Abstract

DNA storage has been a thriving interdisciplinary research area because of its high density, low maintenance cost, and long durability for information storage. However, the complexity of errors in DNA sequences including substitutions, insertions and deletions hinders its application for massive data storage. Motivated by the divide-and-conquer algorithm, we propose a hierarchical error correction strategy for text DNA storage. The basic idea is to design robust codes for common characters which have one-base error correction ability including insertion and/or deletion. The errors are gradually corrected by the codes in DNA reads, multiple alignment of character lines, and finally word spelling. On one hand, the proposed encoding method provides a systematic way to design storage friendly codes, such as 50% GC content, no more than 2-base homopolymers, and robustness against secondary structures. On the other hand, the proposed error correction method not only corrects single insertion or deletion, but also deals with multiple insertions or deletions. Simulation results demonstrate that the proposed method can correct more than 98% errors when error rate is less than or equal to 0.05. Thus, it is more powerful and adaptable to the complicated DNA storage applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.