Abstract

BackgroundIdentifying essential genes in genome-wide loss-of-function screens is a critical step in functional genomics and cancer target finding. We previously described the Bayesian Analysis of Gene Essentiality (BAGEL) algorithm for accurate classification of gene essentiality from short hairpin RNA and CRISPR/Cas9 genome-wide genetic screens.ResultsWe introduce an updated version, BAGEL2, which employs an improved model that offers a greater dynamic range of Bayes Factors, enabling detection of tumor suppressor genes; a multi-target correction that reduces false positives from off-target CRISPR guide RNA; and the implementation of a cross-validation strategy that improves performance ~ 10× over the prior bootstrap resampling approach. We also describe a metric for screen quality at the replicate level and demonstrate how different algorithms handle lower quality data in substantially different ways.ConclusionsBAGEL2 substantially improves the sensitivity, specificity, and performance over BAGEL and establishes the new state of the art in the analysis of CRISPR knockout fitness screens. BAGEL2 is written in Python 3 and source code, along with all supporting files, are available on github (https://github.com/hart-lab/bagel).

Highlights

  • Identifying essential genes in genome-wide loss-of-function screens is a critical step in functional genomics and cancer target finding

  • An improved log likelihood/regression model The analysis pipeline for a loss-of-function fitness screen consists of three steps: (1) mapping reads to the guide sequences in the CRISPR library and building a table of read counts, (2) normalizing counts across samples and calculating guide-level fold change, and (3) compiling guide-level information into gene-level fitness scores (Fig. 1a)

  • The “essential” model is represented by a kernel density estimate (KDE) of the distribution of guide-level fold changes of gRNA targeting a training set of essential genes [14], and the “non-essential” model is likewise trained on a set of non-essential genes (Fig. 1b) [12, 14]

Read more

Summary

Results

An improved log likelihood/regression model The analysis pipeline for a loss-of-function fitness screen consists of three steps: (1) mapping reads to the guide sequences in the CRISPR library and building a table of read counts, (2) normalizing counts across samples and calculating guide-level fold change, and (3) compiling guide-level information into gene-level fitness scores (Fig. 1a). In ovarian endometrioid cancer cell line OVK-18, the multiple-targeting effects of the Avana library showed an incremental BF due to offtargets that increased roughly linearly with the number of perfect-match, off-target cut sites in the genome and a smaller incremental guide-level BF with the frequency of mismatched off-target sites (Fig. 2a; Additional file 1: Fig. S2B). We further demonstrate that agreement of gene essentiality across cell lines screened using both the Avana and KY libraries can be improved by multi-targeting effect correction (Additional file 1: Fig. S3). We show that BAGEL2 can correct the multi-targeting effects from perfect-matched and 1-bp mismatched targets, reducing the number of false positives arising from promiscuous sgRNA effects, and that BAGEL2 accurately discriminates essential genes from non-essentials in comparison with other algorithms. It should be noted that most CRISPR data in the DepMap and Project Score are of sufficiently high quality that this is not an important factor (95% of screens have quality scores > 1.0); researchers should be wary when including marginal quality screens in their analyses

Conclusions
Background
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.