Design of Capacity-Approaching Constrained Codes for DNA-Based Storage Systems

Kees A Schouhamer Immink,Kui Cai

doi:10.1109/lcomm.2017.2775608

Abstract

We consider coding techniques that limit the lengths of homopolymer runs in strands of nucleotides used in DNA-based mass data storage systems. We compute the maximum number of user bits that can be stored per nucleotide when a maximum homopolymer runlength constraint is imposed. We describe simple and efficient implementations of coding techniques that avoid the occurrence of long homopolymers, and the rates of the constructed codes are close to the theoretical maximum. The proposed sequence replacement method for $k$ -constrained $q$ -ary data yields a significant improvement in coding redundancy than the prior art sequence replacement method for the $k$ -constrained binary data. Using a simple transformation, standard binary maximum runlength limited sequences can be transformed into maximum runlength limited $q$ -ary sequences which opens the door to applying the vast prior art binary code constructions to DNA-based storage.

Full Text