Abstract

Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, "gold standard model", due to experimental and algorithmic artefacts. Other research fields use dynamic time warped-space averaging (DTWA) algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of DTW Barycentre averaging (DBA), minimize mean (MM) and stochastic sub-gradient descent (SSG) DTWA algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules Enolase, Sequin R1-71-1 and Sequin R2-55-3 without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to DTWA consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the "true" consensus, the study's gold standard model. The DBA consensus was the best match to the gold standard for both Sequin studies but was outperformed in the Enolase study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel "voting scheme" that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying Enolase and Sequin gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and voted-on DTWA consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and DTWA has application in other research fields involving time-distorted, high entropy signals.

Highlights

  • Picoamperage signals are sampled at 3000 or 4000 Hz as the nucleotides of a DNA or RNA molecule are ratcheted by motor proteins through the nanopore biomolecules in a sensor array [1] on a device such as the Oxford Nanopore MinION sequencer [2,3]

  • To enhance the interpretation of dynamic time warped-space averaging (DTWA) consensus differences, we have investigated the use of a difference from identity (DFI) warp path display, Fig 5C

  • We currently offer no explanation of what characteristics of the minimize mean (MM) DTWA consensus generated in the Sequin-R2-55-3 study, dotted blue line, makes it so similar to the other consensus signals when compared to the gold standard, Fig 12A, yet obviously inconsistent in its normalized, mean DTWDISTANCE minimum behaviour when compared to the ensemble, Fig 12B

Read more

Summary

Author summary

Essentially a matrix full of microscopic pores, provide an interesting new route in identifying changes in DNA/RNA sequences related to diseases. Consensus signals are generated from multiple noisy signals that are so warped that classical averaging techniques fail. To further improve the quality of the consensus signal, we introduced a new idea in allowing the noisy ensemble of signals as a whole to vote on whether specific DTWA consensus components were valid or still a misread. Areas of further improvement have been identified, the voted-DTWA approach already provides cleaner consensus estimates from experimental RNA studies. This is a PLOS Computational Biology Methods paper

1—Introduction
2—Methods
Generate and save cleaned streams if not already stored
Findings
8—Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call