Abstract

BackgroundMuch effort is underway to build and upgrade databases and tools related to occurrence, diversity, and characterization of CRISPR-Cas systems. As microbial communities and their genome complements are unearthed, much emphasis has been placed on details of individual strains and model systems within the CRISPR-Cas classification, and that collection of information as a whole affords the opportunity to analyze CRISPR-Cas systems from a quantitative perspective to gain insight into distribution of CRISPR array sizes across the different classes, types and subtypes. CRISPR diversity, nomenclature, occurrence, and biological functions have generated a plethora of data that created a need to understand the size and distribution of these various systems to appreciate their features and complexity.ResultsBy utilizing a statistical framework and visual analytic techniques, we have been able to test several hypotheses about CRISPR loci in bacterial class I systems. Quantitatively, though CRISPR loci can expand to hundreds of spacers, the mean and median sizes are 40 and 25, respectively, reflecting rather modest acquisition and/or retention overall. Histograms uncovered that CRISPR array size displayed a parametric distribution, which was confirmed by a goodness-of fit test. Mapping the frequency of CRISPR loci on a standardized chromosome plot revealed that CRISPRs have a higher probability of occurring at clustered locations along the positive or negative strand. Lastly, when multiple arrays occur in a particular system, the size of a particular CRISPR array varies with its distance from the cas operon, reflecting acquisition and expansion biases.ConclusionsThis study establishes that bacterial Class I CRISPR array size tends to follow a geometric distribution; these CRISPRs are not randomly distributed along the chromosome; and the CRISPR array closest to the cas genes is typically larger than loci in trans. Overall, we provide an analytical framework to understand the features and behavior of CRISPR-Cas systems through a quantitative lens.ReviewersThis article was reviewed by Eugene Koonin (NIH-NCBI) and Uri Gophna (Tel Aviv University).

Highlights

  • Much effort is underway to build and upgrade databases and tools related to occurrence, diversity, and characterization of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Clustered regularly interspaced short palindromic repeats (CRISPRs) associated (Cas) systems

  • Distribution of CRISPR Array size Initially, descriptive statistics were performed to get a sense of the distribution of CRISPR array size of Class I CRISPR-CRISPR associated (Cas) systems for bacterial chromosomes based on the numbers of spacers in an array

  • A statistical framework and visual analytics were utilized to examine CRISPR array sizes based on the number of spacers in a locus

Read more

Summary

Introduction

Much effort is underway to build and upgrade databases and tools related to occurrence, diversity, and characterization of CRISPR-Cas systems. CRISPR diversity, nomenclature, occurrence, and biological functions have generated a plethora of data that created a need to understand the size and distribution of these various systems to appreciate their features and complexity. Though an increasing amount of bacterial genome data is available in Genbank, the subset of organisms that have been subjected to genome sequencing is not representative across the phylogenetic tree, and displays a bias towards pathogenic species, justifiably. This well-documented bias can possibly influence survey-type studies, and may lead to conclusions that may or not be applicable throughout the tree of life.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call