Abstract

The adaptation of the CRISPR-Cas9 system as a genome editing technique has generated much excitement in recent years owing to its ability to manipulate targeted genes and genomic regions that are complementary to a programmed single guide RNA (sgRNA). However, the efficacy of a specific sgRNA is not uniquely defined by exact sequence homology to the target site, thus unintended off-targets might additionally be cleaved. Current methods for sgRNA design are mainly concerned with predicting off-targets for a given sgRNA using basic sequence features and employ elementary rules for ranking possible sgRNAs. Here, we introduce CRISTA (CRISPR Target Assessment), a novel algorithm within the machine learning framework that determines the propensity of a genomic site to be cleaved by a given sgRNA. We show that the predictions made with CRISTA are more accurate than other available methodologies. We further demonstrate that the occurrence of bulges is not a rare phenomenon and should be accounted for in the prediction process. Beyond predicting cleavage efficiencies, the learning process provides inferences regarding patterns that underlie the mechanism of action of the CRISPR-Cas9 system. We discover that attributes that describe the spatial structure and rigidity of the entire genomic site as well as those surrounding the PAM region are a major component of the prediction capabilities.

Highlights

  • The CRISPR-Cas9 system, a microbial adaptive immune system, was recently exploited for modulating DNA sequences within the endogenous genome in many organisms. This system has emerged as a technology of choice for genome editing with promising therapeutic and research advancements. These exciting developments were not paralleled by deep understanding of CRISPR-Cas9 cleavage efficiency

  • The introduction of gaps to the pairwise sequence alignment affected 18% of the targets in the training dataset, such that 87 of 491 sites contain 1.1 bulges on average. This resulted in r2 = 0.34 averaged over the single guide RNA (sgRNA) datasets compared to r2 = 0.27 when gaps are not allowed

  • Our analysis further demonstrated that the datasets obtained with high-throughput genome-wide translocation sequencing (HTGTS) for unique sgRNAs are not comparable with those obtained with the other platforms

Read more

Summary

Introduction

Several experimental methods for unbiased genome-wide profiling of offtargets were introduced, including those based on integration of oligonucleotides into double strand breaks detected by sequencing (GUIDE-Seq) [16,17,18], high-throughput genome-wide translocation sequencing (HTGTS) [19], direct in situ breaks labelling (BLESS) [20,21], integration-deficient lentiviral vectors (IDLV) [22], and in-vitro nuclease-digested whole-genome sequencing (Digenome-seq) [23,24] These studies demonstrated that CRISPR off-targets can be located at unexpected sites, such as sites that harbor alternative PAM sequences, sites that contain a large number of mismatches, and off-targets that were cleaved at higher frequencies than the intended on-targets. It is becoming clear that an intricate set of attributes play a role in CRISPR-Cas function

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call