Information Density Enhancement Using Lossy Compression in DNA Data Storage.

Seongjun Seo,Anshula Tandon,Keun Woo Lee,Jee-Hyong Lee,Sung Ha Park

doi:10.1002/adma.202403071

Abstract

This study develops two deoxyribonucleic acid (DNA)lossy compression models, Models A and B, to encode grayscale images into DNA sequences, enhance information density, and enable high-fidelity image recovery. These models, distinguished by their handling of pixel domains and interpolation methods, offer a novel approach to data storage for DNA. Model A processes pixels in overlapped domains using linear interpolation (LI), whereas Model B uses non-overlapped domains with nearest-neighbor interpolation (NNI). Through a comparative analysis with Joint Photographic Experts Group (JPEG)compression, the DNA lossy compression models demonstrate competitive advantages in terms of information density and image quality restoration. The application of these models to the Modified National Institute of Standards and Technology (MNIST) dataset reveals their efficiency and the recognizability of decompressed images, which is validated by convolutional neural network (CNN) performance. In particular, Model B2, a version of Model B, emerges as an effective method for balancing high information density (surpassing over 20 times the typical densities of two bits per nucleotide) with reasonably good image quality. These findings highlight the potential of DNA-based data storage systems for high-density and efficient compression, indicating a promising future for biological data storage solutions.

Full Text