Abstract

Due to the advantages of high storage densities and longevity, DNA storage has become one of the attractive technologies for future data storage systems. However, the writing/reading cost is still high and more efficient techniques for DNA storage are required. In this paper, we propose improved log-likelihood ratio (LLR) processing schemes based on observed statistics for low-density parity-check (LDPC) code decoding to reduce reading cost while encoding schemes are kept unchanged. Due to the mismatch between the real channel and the observed statistics and also the limit of maximum decoder input value, scaling the magnitude of LLR can lead to a better error correcting performance. Therefore, we propose two strategies: 1) directly scaling LLRs and 2) scaling pairwise substitution error rates, which changes the magnitude of LLRs. We also suggest the relation between substitution error rate and scaling values in the strategies by using curve fitting methods. Simulation results show that the error correcting performance from the proposed LLR calculation is better than that from the conventional scheme. Finally, we verify that the proposed LLR methods can be generally applied in DNA storage systems, and present practical methods to calculate error rates.

Highlights

  • With the rapid increase in the total amount of data, the demand for data storage is increasing

  • We propose two scaling methods suitable for decoding of low-density parity-check (LDPC) codes since input range in LDPC decoders depends on the error rate and there is a mismatch between exact channel model and loglikelihood ratio (LLR) calculation

  • SIMULATION RESULTS To compare the proposed scaling methods with the conventional one, we use the experiment data in [4], which are synthesized by CustomArray and sequenced by Illumina iSeq technology

Read more

Summary

Introduction

With the rapid increase in the total amount of data, the demand for data storage is increasing. In DNA storage, binary data are synthesized into DNA sequences with four-base nucleotides (A, T , G, C) at a certain writing cost [4] and are stored in a DNA pool. The DNA sequences obtained by the sequencer are regarded as reads [4], [5], which may not be equal to the original sequence due to errors in synthesizing and sequencing processes. A reading cost was defined to evaluate performance, which can be obtained by dividing the total number of bases in the minimum reads required to recover the original DNA sequences by the number of information bits [4]

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.