DNA sequencing with the quantum tunneling technique heralds a paradigm shift in genetic analysis, promising rapid and accurate identification for diverging applications ranging from personalized medicine to security issues. However, the widespread distribution of molecular conductance, conduction orbital alignment for resonant transport, and decoding crisscrossing conductance signals of isomorphic nucleotides have been persistent experimental hurdles for swift and precise identification. Herein, we have reported a machine learning (ML)-driven quantum tunneling study with solid-state model nanogap to determine nucleotides at single-base resolution. The optimized ML basecaller has demonstrated a high predictive basecalling accuracy of all four nucleotides from seven distinct data pools, each containing complex transmission readouts of their different dynamic conformations. ML classification of quaternary, ternary, and binary nucleotide combinations is also performed with high precision, sensitivity, and F1 score. ML explainability unravels the evidence of how extracted normalized features within overlapped nucleotide signals contribute to classification improvement. Moreover, electronic fingerprints, conductance sensitivity, and current readout analysis of nucleotides have promised practical applicability with significant sensitivity and distinguishability. Through this ML approach, our study pushes the boundaries of quantum sequencing by highlighting the effectiveness of single nucleotide basecalling with promising implications for advancing genomics and molecular diagnostics.
Read full abstract