GradHC: highly reliable gradual hash-based clustering for DNA storage systems.

Dvir Ben Shabat,Adar Hadad,Avital Boruchovsky,Eitan Yaakobi

doi:10.1093/bioinformatics/btae274

Abstract

As data storage challenges grow and existing technologies approach their limits, synthetic DNA emerges as a promising storage solution due to its remarkable density and durability advantages. While cost remains a concern, emerging sequencing and synthetic technologies aim to mitigate it, yet introduce challenges such as errors in the storage and retrieval process. One crucial task in a DNA storage system is clustering numerous DNA reads into groups that represent the original input strands. In this paper, we review different methods for evaluating clustering algorithms and introduce a novel clustering algorithm for DNA storage systems, named Gradual Hash-based clustering (GradHC). The primary strength of GradHC lies in its capability to cluster with excellent accuracy various types of designs, including varying strand lengths, cluster sizes (including extremely small clusters), and different error ranges. Benchmark analysis demonstrates that GradHC is significantly more stable and robust than other clustering algorithms previously proposed for DNA storage, while also producing highly reliable clustering results. https://github.com/bensdvir/GradHC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GradHC: highly reliable gradual hash-based clustering for DNA storage systems.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Journal: Bioinformatics (Oxford, England)	Publication Date: Apr 22, 2024
License type: CC BY 4.0

Similar Papers

Weakly mutually uncorrelated codes with maximum run length constraint for DNA storage
Xiaozhou Lu ... Sunghwan Kim
Computers in Biology and Medicine | VOL. 165
Xiaozhou Lu, et. al.Xiaozhou Lu ... Sunghwan Kim
03 Sep 2023
Computers in Biology and Medicine | VOL. 165

NOREC4DNA: using near-optimal rateless erasure codes for DNA storage
Peter Michael Schwarz ... Bernd Freisleben
BMC Bioinformatics | VOL. 22
Peter Michael Schwarz, et. al.Peter Michael Schwarz ... Bernd Freisleben
17 Aug 2021
BMC Bioinformatics | VOL. 22

GCNSA: DNA storage encoding with a graph convolutional network and self-attention
Ben Cao ... Qiang Zhang
iScience | VOL. 26
Ben Cao, et. al.Ben Cao ... Qiang Zhang
19 Feb 2023
iScience | VOL. 26

RETRACTED: Bionic‐structure thermo‐responsive (best) hydrogels with controllable layer for high‐capacity DNA data storage
Zhongjie Fei ... Chu Cheng
Nano Select | VOL. 5
Zhongjie Fei, et. al.Zhongjie Fei ... Chu Cheng
07 Dec 2022
Nano Select | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GradHC: highly reliable gradual hash-based clustering for DNA storage systems.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)