Abstract

BackgroundPooled library screen analysis using shRNAs or CRISPR-Cas9 hold great promise to genome-wide functional studies. While pooled library screens are effective tools, erroneous barcodes can potentially be generated during the production of many barcodes. However, no current tools can distinguish erroneous barcodes from PCR or sequencing errors in a data preprocessing step.ResultsWe developed the Barcas program, a specialized program for the mapping and analysis of multiplexed barcode sequencing (barcode-seq) data. For fast and efficient mapping, Barcas uses a trie data structure based imperfect matching algorithm which generates precise mapping results containing mismatches, shifts, insertions and deletions (indel) in a flexible manner. Barcas provides three functions for quality control (QC) of a barcode library and distinguishes erroneous barcodes from PCR or sequencing errors. It also provides useful functions for data analysis and visualization.ConclusionsBarcas is an all-in-one package providing useful functions including mapping, data QC, library QC, statistical analysis and visualization in genome-wide pooled screens.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1326-9) contains supplementary material, which is available to authorized users.

Highlights

  • Pooled library screen analysis using shRNAs or CRISPR-Cas9 hold great promise to genome-wide functional studies

  • Barcode-seq is used in several genome-wide screening tools, including shRNAs for gene knock-down [3], sgRNAs for genome editing [4] and barcoded yeast deletion strains in Saccharomyces cerevisiae and Schizosaccharomyces pombe

  • Sequencing or PCR errors vs. erroneous barcodes While barcode-seq data should ideally be mapped by perfect matching, systematic errors, such as errors that occur during PCR amplification or random mutations during strain maintenance, can lead to erroneous barcode sequences that are not mapped by perfect matching

Read more

Summary

Introduction

Pooled library screen analysis using shRNAs or CRISPR-Cas hold great promise to genome-wide functional studies. No current tools can distinguish erroneous barcodes from PCR or sequencing errors in a data preprocessing step. Barcode-seq is a next-generation sequencing (NGS) technique that reads genome-integrated artificial sequences called barcodes that mark biological materials, such as cells or genes, with unique sequences [1]. Having a unique barcode facilitates tracking materials of interest in genome-wide functional screens as well as the identification of drug targets or disease-associated genes [2]. Barcode-seq is used in several genome-wide screening tools, including shRNAs for gene knock-down [3], sgRNAs for genome editing [4] and barcoded yeast deletion strains in Saccharomyces cerevisiae and Schizosaccharomyces pombe. There are many steps during barcode-seq that can generate errors: chemical synthesis of oligonucleotides, PCR amplification and NGS to name a few [1]. It is necessary to consider all kinds of possible errors when performing barcode-seq experiments

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call