Abstract
Due to the advantages of high information densities and longevity, DNA storage systems have begun to attract a lot of attention. However, common obstacles to DNA storage are caused by insertion, deletion, and substitution errors occurring in DNA synthesis and sequencing. In this paper, we first explain a method to convert binary data into general maximum run-length $r$ sequences with specific length construction, which can be used as the message sequence of our proposed code. Then, we propose a new single insertion/deletion nonbinary systematic error correction code and its corresponding encoding algorithm. For the proposed code, we design the fixed maximum run-length $r$ in the parity sequence of the proposed code to be three. Additionally, the last parity symbol and the first message symbol are always different. Hence, the overall maximum run-length $r$ of the output codeword is guaranteed to be three when the maximum run-length of the message sequence is three. Finally, we determine the feasibility of the proposed encoding algorithm, verify successful decoding when a single insertion/deletion error occurs in the codeword, and present the comparison results with relevant works.
Highlights
As people gradually rely on more and more data, the hardware of data storage systems has been gradually upgraded
We propose a new nonbinary single insertion/deletion error correction (SIDEC) code with the maximum run-length r constraint and systematic encoding algorithm
We present an application of a codeword where the maximum run-length r is three for DNA storage
Summary
As people gradually rely on more and more data, the hardware of data storage systems has been gradually upgraded. A new binary SIDEC code combined with the maximum runlength r constrained code and efficient systematic encoding algorithm was proposed in [18] All of these studies [8], [9], [11], [12], [17] focused on binary coding schemes for DNA storage systems. Insertion and deletion errors are inevitably bound to occur in the process of DNA synthesis and sequencing For these purposes, we propose a new nonbinary SIDEC code with the maximum run-length r constraint and systematic encoding algorithm. Simulation results show that the encoding algorithm is feasible and the q-ary SIDEC code with the maximum runlength r constraint can correct a single deletion or insertion error.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have