Abstract

There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequence similarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.

Highlights

  • Gram-negative bacteria are a major cause of many human diseases and, due to the emergence of antibiotic resistance, new means to combat them are a pressing international health issue

  • Traditional methods of identifying antibiotic resistance transporters We first evaluated how well previously generated Hidden Markov models (HMM) models from the Pfam database could discriminate between multi-drug resistance transporters (MDRs) and non-MDR transporters

  • Proactive Intelligent Learning with Grammar (PILGram) model training We examined the ability of PILGram to find patterns capable of identifying MDR transporters from other transporter sequences

Read more

Summary

METHOD ARTICLE

Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 1; peer review: 2 approved with reservations].

Introduction
Methods
Results
Discussion and conclusions
Data & Code Availability
Findings
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call