HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints

William H Press,John A Hawkins,Stephen K Jones,Jeffrey M Schaub,Ilya J Finkelstein

doi:10.1073/pnas.2004821117

Abstract

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences of the United States of America	Publication Date: Jul 16, 2020
Citations: 84	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America

Lead the way for us

Similar Papers

DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage
Marius Welzel ... Dominik Heider
Nature Communications | VOL. 14
Marius Welzel, et. al.Marius Welzel ... Dominik Heider
06 Feb 2023
Nature Communications | VOL. 14

The Long and Winding Road: On-Demand DNA Synthesis in High Demand
Jonathan D Grinstein
GEN biotechnology | VOL. 2
Jonathan D GrinsteinJonathan D Grinstein
01 Apr 2023
GEN biotechnology | VOL. 2

RADIATION RESISTANCE AND DEOXYRIBONUCLEIC ACID BASE COMPOSITION OF MICROCOCCUS RADIODURANS.
B E B Moseley ... Arnold H Schein
NatureJobs | VOL. 203
B E B Moseley, et. al.B E B Moseley ... Arnold H Schein
01 Sep 1964
NatureJobs | VOL. 203

Low Delay Single Symbol Error Correction Codes Based on Reed Solomon Codes
Salvatore Pontarelli ... Pedro Reviriego
IEEE Transactions on Computers | VOL. 64
Salvatore Pontarelli, et. al.Salvatore Pontarelli ... Pedro Reviriego
01 May 2015
IEEE Transactions on Computers | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America