Abstract

BackgroundIntragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins.ResultsUsing a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling.ConclusionsWe describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches.

Highlights

  • Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products

  • Dispersal of a tandemly arrayed repeat protein sequence in the genomes of unicellular microbes from diverse phylogenetic clades and ecological niches To establish an operational motif we first constructed an hidden Markov model (HMM) based on a training dataset of ORFs from genomes of Mollicutes that contained a previously reported 25-residue amino acid sequence pattern [15], refined the HMM using iterations of data sets expanded from successive searches of the non-redundant protein sequence database [24]

  • These ORFs represented individual sequences derived from the Global Ocean Sampling (GOS) project [25] and are not assigned to taxa

Read more

Summary

Introduction

Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Surface membrane proteins with repeating sequence motifs abound even among minimalist organisms such as Mollicutes (phylum Tenericutes, termed mycoplasmas) a clade of wall-less monoderms with minimal-size, low G+C genomes and parasitic lifestyles. These products are most commonly encoded by families of accessory genes [10,11] specific to a particular clade or individual taxon, in which distinctive repeats are encoded by individual genes [12,13,14]. The distinctive sequence diversity in this repeat pattern, its demonstrated expression in two known surface membrane proteins, and the prospect that the coding sequence is disseminated horizontally prompted its further examination as a model for the acquisition of a versatile coding module contributing to proteomic diversity

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call