A Strategy-based Optimization Algorithm to Design Codes for DNA Data Storage System

Abdur Rasool,Qingshan Jiang,Yang Wang,Qiang Qu

doi:10.1007/978-3-030-95388-1_19

Abstract

The exponential increase of big data volumes demands a large capacity and high-density storage. Deoxyribonucleic acid (DNA) has recently emerged as a new research trend for data storage in various studies due to its high capacity and durability, where primers and address sequences played a vital role. However, it is a critical biocomputing task to design DNA strands without errors. In the DNA synthesis and sequencing process, each nucleotide is repeated, which is prone to errors during the hybridization reactions. It decreases the lower bounds of DNA coding sets which causes the data storage stability. This study proposes a metaheuristic algorithm to improve the lower bounds of DNA data storage. The proposed algorithm is inspired by a moth-flame optimizer (MFO), which has exploration and exploitation capability in one dimension, and it is enhanced by opposition-based learning (OBL) strategy with three-dimension search space for the optimal solution; hereafter, it is MFOL algorithm. This algorithm is programmed to construct the DNA storage codes by reducing the error rates of DNA coding sets with GC-content, Hamming distance, and No-runlength constraints. In experiments, 13 benchmark functions and Wilcoxon rank-sum test are implemented, and performances are compared with the original MFO and three other algorithms. The generated DNA codewords by MFOL are compared with a state-of-the-art Altruistic algorithm and KMVO algorithm. The proposed algorithm improved 30% DNA coding rates with shorter sequences, reducing errors during DNA synthesis and sequencing.

Full Text