Molecular-level similarity search brings computing to DNA data storage

Callista Bee,Yuan-Jyue Chen,Karin Strauss,David Ward,Xiaomeng Liu,Luis Ceze,Lee Organick,Georg Seelig,Melissa Queen

doi:10.1038/s41467-021-24991-z

Callista Bee, Yuan-Jyue Chen + Show 7 more

Open Access

https://doi.org/10.1038/s41467-021-24991-z

Copy DOI

Abstract

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.

Highlights

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media
Recent advances in DNA nanotechnology have demonstrated synthetic DNA’s ability to carry out molecular computations for biochemical applications, such as gene expression classification[1,2,3,4]. These applications reflect a paradigm shift in the field of DNA computing, away from parallel computing in the style of Adleman’s solution to the traveling salesperson problem[5], and towards DNA strand-displacement circuits[6,7] and algorithmic self-assembly[8,9]. This shift was motivated by the recognition that encoding combinatorial problems requires synthesizing exponential amounts of DNA, and that synthetic DNA is better suited to implement circuits, which autonomously analyze information already encoded in the concentrations and sequences of nucleic acid molecules
To store an arbitrary digital file in DNA, its binary data are translated into a DNA sequence using error-correcting codes that account for limitations and errors in DNA synthesis and sequencing

Summary

Results

Sequence encoder maps similar images to similar DNA sequences. As in our prior work, we focus on encoding feature vectors derived from images, because large datasets and feature extractors are readily available, and similarity between images is easy to visualize. After determining whether or not they are similar, the pair of image feature vectors are encoded independently to produce a pair of softmax-encoded DNA sequences These sequences are passed to the hybridization predictor, which computes local matches in a small sliding window that allows for misalignments (Supplementary Fig. S3B), performs pooling and convolution operations to produce a predicted yield.

Discussion

Methods

Code availability

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Aug 6, 2021
Citations: 45	License type: open-access

R Discovery Prime

R Discovery Prime

Molecular-level similarity search brings computing to DNA data storage

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

Application of kernel functions for accurate similarity search in large chemical databases.
Xiaohong Wang ... Jun Huan
BMC Bioinformatics | VOL. Suppl 11 3
Xiaohong Wang, et. al.Xiaohong Wang ... Jun Huan
01 Apr 2010
BMC Bioinformatics | VOL. Suppl 11 3

SymDex: Increasing the Efficiency of Chemical Fingerprint Similarity Searches for Comparing Large Chemical Libraries by Using Query Set Indexing
David Tai ... Jianwen Fang
Journal of Chemical Information and Modeling | VOL. 52
David Tai, et. al.David Tai ... Jianwen Fang
07 Aug 2012
Journal of Chemical Information and Modeling | VOL. 52

Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing
Yiqun Cao ... Tao Jiang
Bioinformatics | VOL. 26
Yiqun Cao, et. al.Yiqun Cao ... Tao Jiang
23 Feb 2010
Bioinformatics | VOL. 26

Efficient processing of similarity search under time warping in sequence databases: an index-based approach
Sang-Wook Kim ... Wesley W Chu
Information Systems | VOL. 29
Sang-Wook Kim, et. al.Sang-Wook Kim ... Wesley W Chu
05 Jun 2003
Information Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Molecular-level similarity search brings computing to DNA data storage

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications