Abstract

BackgroundThe all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification. Existing work indicates that the current spectra recorded from SMBJ experimentations contain unique signatures to identify known sequences from a dataset. However, the spectra are typically extremely noisy due to the stochastic and complex interactions between the substrate, sample, environment, and the measuring system, necessitating hundreds or thousands of experimentations to obtain reliable and accurate results.ResultsThis article presents a DNA sequence identification system based on the current spectra of ten short strand sequences, including a pair that differs by a single mismatch. By employing a gradient boosted tree classifier model trained on conductance histograms, we demonstrate that extremely high accuracy, ranging from approximately 96 % for molecules differing by a single mismatch to 99.5 % otherwise, is possible. Further, such accuracy metrics are achievable in near real-time with just twenty or thirty SMBJ measurements instead of hundreds or thousands. We also demonstrate that a tandem classifier architecture, where the first stage is a multiclass classifier and the second stage is a binary classifier, can be employed to boost the single mismatched pair’s identification accuracy to 99.5 %.ConclusionsA monolithic classifier, or more generally, a multistage classifier with model specific parameters that depend on experimental current spectra can be used to successfully identify DNA strands.

Highlights

  • The all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification

  • The conductance histograms from experimentally measured currents are intrinsically noisy, with values ranging over three orders of magnitude

  • This is because the number of atoms involved is extremely large, the environment fluctuates, and there are a large number of Deoxyribonucleic acid (DNA)-contact configurations, which makes it impossible to model the system in a realistic manner

Read more

Summary

Introduction

The all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification. A promising technique is an all-electronic method that would identify DNA or proteins based on the characteristic measurements of current [7] This all-electronic DNA sequence identification system experiences nonlinear interactions between the substrate, sample, environment, and measuring system that are inherently stochastic [8, 9]. It has been extremely challenging for physics-based models to capture the differences in current between nominally different DNA strands. Identification of small molecules from a mixture was recently demonstrated using machine learning classification methods [15,16,17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call