NeuRiPP: Neural network identification of RiPP precursor peptides

Emmanuel L C De Los Santos

doi:10.1038/s41598-019-49764-z

Emmanuel L C De Los Santos

Open Access

https://doi.org/10.1038/s41598-019-49764-z

Copy DOI

Journal: Scientific Reports	Publication Date: Sep 16, 2019
Citations: 56	License type: open-access

Affiliation: University of Warwick

Abstract

Significant progress has been made in the past few years on the computational identification of biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. Current machine learning tools have limitations, since they are specific to the RiPPclass they are trained for and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network archictectures that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP was able to identify PP sequences in significantly more putative RiPP clusters than current tools while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that were recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.

Highlights

Specialized metabolites from bacteria have been a source of bioactive chemical compounds with myriad applications especially in the pharmaceutical and agrochemical industries[1]
In order to check that the high accuracy was not due to the neural network being overfit to the data, the models were trained on a dataset that randomly excluded 15% of the positive dataset (550 sequences), and 8.6% of the negative set (1650 sequences)
Peptide sequences classified as NeuRiPP hits show a similar or higher hidden Markov models (HMMs) hit rate to precursor peptide predictions in existing tools

Summary

Introduction

Specialized metabolites from bacteria have been a source of bioactive chemical compounds with myriad applications especially in the pharmaceutical and agrochemical industries[1]. Proper identification of PPs is an important aspect of in silico RiPP BGC analysis as knowledge of the PP sequence can aid in structure elucidation and provide information on the molecular interactions between the RTEs and the PP5 To this end, several methods have been developed to identify putative PPs in regions in proximity to RTEs. To this end, several methods have been developed to identify putative PPs in regions in proximity to RTEs This typically involves a two-step process where sequences to be screened are first identified either through the use of gene-finding software[5,7], or from identifying open reading frames (ORFs) of specified length in the proximity of RTEs6,8. Prodigal-short was used to find putative PPs in proximity to RTEs, and peptide similarity network analysis of the identified PPs was used to identify new RiPP classes[5] This demonstrated the potential of using gene-finding software as a starting point for identifying novel RiPPs; the number of likely coding sequences from this approach was still large.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NeuRiPP: Neural network identification of RiPP precursor peptides

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides
Roland G Roberts ... Somayah S Elsayed
-
Roland G Roberts, et. al.Roland G Roberts ... Somayah S Elsayed
22 Dec 2020
22 Dec 2020

Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides.
Alexander M Kloosterman ... Gilles P Van Wezel
PLOS Biology | VOL. 18
Alexander M Kloosterman, et. al.Alexander M Kloosterman ... Gilles P Van Wezel
22 Dec 2020
PLOS Biology | VOL. 18

Bioinformatic prediction and experimental validation of RiPP recognition elements.
Kyle E Shelton ... Douglas A Mitchell
Methods in enzymology | VOL. 679
Kyle E Shelton, et. al.Kyle E Shelton ... Douglas A Mitchell
01 Jan 2023
Methods in enzymology | VOL. 679

Structure Prediction and Synthesis of Pyridine-Based Macrocyclic Peptide Natural Products.
Graham A Hudson ... Douglas A Mitchell
Organic Letters | VOL. 23
Graham A Hudson, et. al.Graham A Hudson ... Douglas A Mitchell
26 Aug 2020
Organic Letters | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NeuRiPP: Neural network identification of RiPP precursor peptides

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports