Abstract
ABSTRACTClustered regularly interspaced short palindromic repeats and CRISPR-associated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. For only a small fraction of the spacers, homologous sequences, called protospacers, are detectable in viral, plasmid, and microbial genomes. The rest of the spacers remain the CRISPR “dark matter.” We performed a comprehensive analysis of the spacers from all CRISPR-cas loci identified in bacterial and archaeal genomes, and we found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1% to about 19% of the spacers (~7% global average). Among the detected protospacers, the majority, typically 80 to 90%, originated from viral genomes, including proviruses, and among the rest, the most common source was genes that are integrated into microbial chromosomes but are involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes showed a nearly perfect correlation and were almost identical. Given the near absence of self-targeting spacers, these findings are most compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes.
Highlights
Clustered regularly interspaced short palindromic repeats and CRISPRassociated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays
We compared the features of the dark matter spacer sequences with those of the spacers with matches as well as host and virus genomes. The results of these analyses suggested that all spacers in the CRISPR arrays from sequenced bacterial and archaeal genomes originated from the pool of mobile genetic elements (MGE) associated with the genome, in which the given CRISPR-cas locus resides, and its close relatives
In order to explore the origins of CRISPR spacers, a computational pipeline was developed that identified all CRISPR arrays from complete and partial bacterial and archaeal genomes
Summary
Clustered regularly interspaced short palindromic repeats and CRISPRassociated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and transcribed and employed as guides to identify and inactivate the cognate parasitic genomes. CRISPR-Cas (clustered regularly interspaced palindromic repeats and CRISPRassociated proteins) systems are adaptive (acquired) immune systems of archaea and bacteria that store memory of past encounters with foreign DNA in unique spacer. Sequences that are excised from viral and plasmid genomes by the Cas adaptation machinery or, alternatively, reverse transcribed from foreign RNA and inserted into CRISPR arrays [1,2,3,4]. In the third and final stage, interference, the effector Cas protein complex mediates recognition of the target DNA or RNA via base-pairing between the spacer and cognate protospacer, followed by cleavage of the target by Cas nucleases [19,20,21,22,23,24]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have