Abstract

There has been rapid progress in the development of error-correcting and constrained codes for DNA storage systems in recent years. However, improving the steps for processing raw sequencing data for DNA storage has a lot of untapped potential for further progress. In particular, constraints can be used as prior information to improve the processing of DNA sequencing data. Furthermore, a workflow tailored to DNA storage codes enables fair comparisons between different approaches while leading to reproducible results. We present RepairNatrix, a read-processing workflow for DNA storage. RepairNatrix supports preprocessing of raw sequencing data for DNA storage applications and can be used to flag and heuristically repair constraint-violating sequences to further increase the recoverability of encoded data in the presence of errors. Compared to a preprocessing strategy without repair functionality, RepairNatrix reduced the number of raw reads required for the successful, error-free decoding of the input files by a factor of 25-35 across different datasets. RepairNatrix is available on Github: https://github.com/umr-ds/repairnatrix.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.