Abstract

There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequence similarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.

Highlights

  • Gram-negative bacteria are a major cause of many human diseases and, due to the emergence of antibiotic resistance, new means to combat them are a pressing international health issue

  • Alignment-free identification of discriminatory protein patterns in PROSITE To test the ability of Proactive Intelligent Learning with Grammar (PILGram) to identify discriminatory regular expressions from unaligned sequences we focused on a welldefined group of proteins with a known discriminatory pattern

  • PILGram model training We examined the ability of PILGram to find patterns capable of identifying multi-drug resistance transporters (MDRs) transporters from other transporter sequences

Read more

Summary

METHOD ARTICLE

Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; peer review: 2 approved].

Introduction
Methods
Results
Discussion and conclusions
Data & Code Availability
Findings
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call