Abstract

BackgroundGenome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed.ResultsHere, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity.ConclusionThe GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html.

Highlights

  • Genome editing technologies developed around the Clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions

  • S. pyogenes Cas9 (SpCas9)-based nucleases can cleave an imperfect heteroduplex formed between the guide sequence and a DNA sequence containing a functional Protospacer Adjacent Motif (PAM) [9,10,11,12,13,14,15,16], where the number, position and type of base mismatches impact its level of activity [11, 12, 16]

  • Analysis of published GUIDE-seq dataset To evaluate the performance of our GUIDEseq analysis package, we analyzed several datasets produced in house and successfully identified the intended target sites and validated GUIDE-seq identified off-targets using deep sequencing of PCR amplicons spanning these genomic loci from nuclease-treated cells [25]

Read more

Summary

Results

Analysis of published GUIDE-seq dataset To evaluate the performance of our GUIDEseq analysis package, we analyzed several datasets produced in house and successfully identified the intended target sites and validated GUIDE-seq identified off-targets using deep sequencing of PCR amplicons spanning these genomic loci from nuclease-treated cells [25]. We analyzed a dataset generously supplied by the Joung laboratory (HEK293 guide 4), and compared our list of identified off-target sites with their previously published analysis [19]. When comparing our output to the previously published analysis from the Joung laboratory [19], the number of potential off-target sites and unique reads associated with each peak (their rank order) are very similar (Additional file 4: Table S2). When analyzing SpCas data only a small number of target-specific inputs are required from users Detailed description of these parameters and the input files are available at http://bioconductor.org/packages/ release/bioc/manuals/GUIDEseq/man/GUIDEseq.pdf. If a control sample without nuclease is available, peaks present in the control sample can be removed from the gRNA samples by setting the control.sample.name

Conclusion
Background
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call