Abstract

CRISPR arrays contain spacers, some of which are homologous to genome segments of viruses and other parasitic genetic elements and are employed as portion of guide RNAs to recognize and specifically inactivate the target genomes. However, the fraction of the spacers in sequenced CRISPR arrays that reliably match protospacer sequences in genomic databases is small, leaving the question of the origin(s) open for the great majority of the spacers. Here, we extend the spacer analysis by examining the distribution of partial matches (matching k-mers) between spacers and genomes of viruses infecting the given host as well as the host genomes themselves. The results indicate that most of the spacers originate from the host-specific viromes, whereas self-targeting is strongly selected against. However, we present evidence that the vast majority of the viruses comprising the viromes currently remain unknown although they are likely to be related to identified viruses.

Highlights

  • CRISPR arrays contain spacers, some of which are homologous to genome segments of viruses and other parasitic genetic elements and are employed as portion of guide RNAs to recognize and inactivate the target genomes

  • Matches between spacers in CRISPR arrays and protospacer in virus genomes comprised the host–array–virus links that were used to construct the dataset for the detailed spacerome analysis

  • The 154 genomes linked to viruses contained 392 CRISPR arrays with 10,555 individual spacers

Read more

Summary

Introduction

CRISPR arrays contain spacers, some of which are homologous to genome segments of viruses and other parasitic genetic elements and are employed as portion of guide RNAs to recognize and inactivate the target genomes. A recent comprehensive survey of CRISPR spacers has shown that most of the identifiable protospacers originate from viruses, proviruses, or other mobile genetic elements[7]. Because protospacer identification relies on comparison of short (20–40 nt) nucleotide sequences, to avoid spurious matches, the search must be highly restrictive, allowing one or two mismatches at most Under this strict criterion, search of the available genomic databases resulted in the detection of protospacers for less than 10% of the CRISPR spacers[7]. Most of the dark matter spacers appear to originate from the dominant but still unknown, most likely, host-specific segment of the mobilome

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.