Abstract

ABSTRACTClustered regularly interspaced short palindromic repeats and CRISPR-associated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. For only a small fraction of the spacers, homologous sequences, called protospacers, are detectable in viral, plasmid, and microbial genomes. The rest of the spacers remain the CRISPR “dark matter.” We performed a comprehensive analysis of the spacers from all CRISPR-cas loci identified in bacterial and archaeal genomes, and we found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1% to about 19% of the spacers (~7% global average). Among the detected protospacers, the majority, typically 80 to 90%, originated from viral genomes, including proviruses, and among the rest, the most common source was genes that are integrated into microbial chromosomes but are involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes showed a nearly perfect correlation and were almost identical. Given the near absence of self-targeting spacers, these findings are most compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes.

Highlights

  • Clustered regularly interspaced short palindromic repeats and CRISPRassociated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays

  • We compared the features of the dark matter spacer sequences with those of the spacers with matches as well as host and virus genomes. The results of these analyses suggested that all spacers in the CRISPR arrays from sequenced bacterial and archaeal genomes originated from the pool of mobile genetic elements (MGE) associated with the genome, in which the given CRISPR-cas locus resides, and its close relatives

  • In order to explore the origins of CRISPR spacers, a computational pipeline was developed that identified all CRISPR arrays from complete and partial bacterial and archaeal genomes

Read more

Summary

Introduction

Clustered regularly interspaced short palindromic repeats and CRISPRassociated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and transcribed and employed as guides to identify and inactivate the cognate parasitic genomes. CRISPR-Cas (clustered regularly interspaced palindromic repeats and CRISPRassociated proteins) systems are adaptive (acquired) immune systems of archaea and bacteria that store memory of past encounters with foreign DNA in unique spacer. Sequences that are excised from viral and plasmid genomes by the Cas adaptation machinery or, alternatively, reverse transcribed from foreign RNA and inserted into CRISPR arrays [1,2,3,4]. In the third and final stage, interference, the effector Cas protein complex mediates recognition of the target DNA or RNA via base-pairing between the spacer and cognate protospacer, followed by cleavage of the target by Cas nucleases [19,20,21,22,23,24]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call