Abstract

Large-scale genomic alterations play an important role in disease, gene expression, and chromosome evolution. Optical DNA mapping (ODM), commonly categorized into sparsely-labelled ODM and densely-labelled ODM, provides sequence-specific continuous intensity profiles (DNA barcodes) along single DNA molecules and is a technique well-suited for detecting such alterations. For sparsely-labelled barcodes, the possibility to detect large genomic alterations has been investigated extensively, while densely-labelled barcodes have not received as much attention. In this work, we introduce HMMSV, a hidden Markov model (HMM) based algorithm for detecting structural variations (SVs) directly in densely-labelled barcodes without access to sequence information. We evaluate our approach using simulated data-sets with 5 different types of SVs, and combinations thereof, and demonstrate that the method reaches a true positive rate greater than 80% for randomly generated barcodes with single variations of size 25 kilobases (kb). Increasing the length of the SV further leads to larger true positive rates. For a real data-set with experimental barcodes on bacterial plasmids, we successfully detect matching barcode pairs and SVs without any particular assumption of the types of SVs present. Instead, our method effectively goes through all possible combinations of SVs. Since ODM works on length scales typically not reachable with other techniques, our methodology is a promising tool for identifying arbitrary combinations of genomic alterations.

Highlights

  • Optical DNA mapping (ODM) provides a sequence-specific fluorescence “fingerprint” (DNA barcode) for single DNA molecules, which is well suited for analyzing ultra-long DNA molecules (> 105 basepairs long)

  • We ran our Hidden Markov Model (HMM) model with gridded parameter values for noisified random structural variations (SVs) barcodes in order to generate true positive and true negative rates

  • Since our method is complemented by p-value thresholding, most false positives are discarded using post-processing, and we make the parameter selection based on the true positive rate

Read more

Summary

Introduction

Optical DNA mapping (ODM) provides a sequence-specific fluorescence “fingerprint” (DNA barcode) for single DNA molecules, which is well suited for analyzing ultra-long DNA molecules (> 105 basepairs (bp) long). The barcodes are created by fluorescent labelling of individual DNA molecules in a sequence-specific manner, stretching the molecules using nanochannels or surface adsorption, and imaging them using a fluorescence microscope [1]. The most common approach of DNA labelling is sparse enzymatic labelling. The output of this approach is an array of sequence-specific “dots” along the DNA. Individual dots are not discernible (the resolution of a single dot is described by a point spread function with a width σpsf, typically around 1 kb) and, rather, the output is a sequence-specific continuous intensity profile (barcode) along the DNA

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call