Sequence Reconstruction Under Stutter Noise in Enzymatic DNA Synthesis

Roy Shafir,Eitan Yaakobi,Leon Anavy,Zohar Yakhini,Omer Sabary

doi:10.1109/itw48936.2021.9611362

Abstract

Synthetic DNA is an attractive alternative for data storage media due to its high information density, low energy usage, and exceptional robustness. Enzymatic DNA synthesis was recently introduced to allow cost effective synthesis of longer DNA molecules for data storage. This method is characterized by stutter errors which are sticky insertions so that every base in the designed sequence may be synthesized more than once. In this work, we study the problem of reconstructing the original sequence from a set of noisy reads originating from the stuttering enzymatic synthesis. We present different reconstruction algorithms and analyze their expected success probability and error rate for three different scenarios that depend on the information which is known about the stutter errors. We evaluate algorithmic performance analytically as well as by using simulations. We are especially interested in characterizing the performance as a function of the read depth. Our findings can be used to evaluate the trade-offs between synthesis quality indicators and the sequencing depth required for reconstruction with high probability. In principle, the probability of reconstruction failure exponentially decays with the sequencing depth, as demonstrated in the study. We also analyze the use of error-correcting codes to improve the error performance.

Full Text