Abstract

A technology capable of sequencing individual protein molecules would revolutionize our understanding of biological processes. Nanopore technology can analyze single heteropolymer molecules such as DNA by measuring the ionic current flowing through a single nanometer hole made in an electrically insulating membrane. This current is sensitive to the monomer sequence. However, proteins are remarkably complex and identifying a single residue change in a protein remains a challenge. In this work, I show that simple neural networks can be trained to recognize protein mutants. Although these networks are quickly and efficiently trained, their ability to generalize in an independent experiment is poor. Using a thermal annealing protocol on the nanopore sample, and examining many mutants with the same nanopore sensor are measures aimed at reducing training data variability which produce an increase in the generalizability of the trained neural network. Using this approach, we obtain a 100% correct assignment among 9 mutants in >50% of the experiments. Interestingly, the neural network performance, compared to a random guess, improves as more mutants are included in the dataset for discrimination. Engineered nanopores prepared with high homogeneity coupled with state-of-the-art analysis of the ionic current signals may enable single-molecule protein sequencing.

Highlights

  • The number of protein species produced by a genome vastly exceeds the number of genes (Ponomarenko et al, 2016; Smith and Kelleher, 2013)

  • The α-HL nanopore has been used for recording single protein molecules of thioredoxin unfolding and translocating the nanopore (Rodriguez-Larrea and Bayley, 2013) and it has been shown that the ionic current signal is modulated by phosphorylation of residue #100 (Rosen et al, 2014)

  • I have shown that NNs can learn from the ionic currents produced by single protein molecules translocating a nanopore to discern single residue mutations

Read more

Summary

Introduction

The number of protein species produced by a genome vastly exceeds the number of genes (Ponomarenko et al, 2016; Smith and Kelleher, 2013). An additional level of complexity is the concentration at which each proteoform is found (Ghaemmaghami et al, 2003), notably this variable determines the phenotypic outcome. The proteome size and composition remain largely unknown (Aebersold and Mann, 2016). We lack appropriate methods to analyze the enormous complexity of the proteome. Bottom-up prote­ omics can identify thousands of proteins in a complex mixture, but by analyzing peptide fragments, they produce a puzzle that cannot be univocally solved (Schaffer et al, 2019). Top-down proteomics can distinguish between closely related proteoforms (Donnelly et al, 2019)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.