Reading digital information from highly dense but lightweight DNA medium nowadays relies on time-consuming next-generation sequencing. Nanopore sequencing holds the promise to overcome the efficiency problem, but high indel error rates lead to the requirement of large amount of high quality data for accurate readout. Here we introduce Composite Hedges Nanopores, capable of handling indel rates up to 15.9% and substitution rates up to 7.8%. The overall information density can be doubled from 0.59 to 1.17 by utilizing a degenerated eight-letter alphabet. We demonstrate that sequencing times of 20 and 120 minutes are sufficient for processing representative text and image files, respectively. Moreover, to achieve complete data recovery, it is estimated that text and image data require 4× and 8× physical redundancy of composite strands, respectively. Our codec system excels on both molecular design and equalized dictionary usage, laying a solid foundation approaching to real-time DNA data retrieval and encoding.
Read full abstract