Abstract
BackgroundPooled library screen analysis using shRNAs or CRISPR-Cas9 hold great promise to genome-wide functional studies. While pooled library screens are effective tools, erroneous barcodes can potentially be generated during the production of many barcodes. However, no current tools can distinguish erroneous barcodes from PCR or sequencing errors in a data preprocessing step.ResultsWe developed the Barcas program, a specialized program for the mapping and analysis of multiplexed barcode sequencing (barcode-seq) data. For fast and efficient mapping, Barcas uses a trie data structure based imperfect matching algorithm which generates precise mapping results containing mismatches, shifts, insertions and deletions (indel) in a flexible manner. Barcas provides three functions for quality control (QC) of a barcode library and distinguishes erroneous barcodes from PCR or sequencing errors. It also provides useful functions for data analysis and visualization.ConclusionsBarcas is an all-in-one package providing useful functions including mapping, data QC, library QC, statistical analysis and visualization in genome-wide pooled screens.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1326-9) contains supplementary material, which is available to authorized users.
Highlights
Pooled library screen analysis using shRNAs or CRISPR-Cas9 hold great promise to genome-wide functional studies
Barcode-seq is used in several genome-wide screening tools, including shRNAs for gene knock-down [3], sgRNAs for genome editing [4] and barcoded yeast deletion strains in Saccharomyces cerevisiae and Schizosaccharomyces pombe
Sequencing or PCR errors vs. erroneous barcodes While barcode-seq data should ideally be mapped by perfect matching, systematic errors, such as errors that occur during PCR amplification or random mutations during strain maintenance, can lead to erroneous barcode sequences that are not mapped by perfect matching
Summary
Pooled library screen analysis using shRNAs or CRISPR-Cas hold great promise to genome-wide functional studies. No current tools can distinguish erroneous barcodes from PCR or sequencing errors in a data preprocessing step. Barcode-seq is a next-generation sequencing (NGS) technique that reads genome-integrated artificial sequences called barcodes that mark biological materials, such as cells or genes, with unique sequences [1]. Having a unique barcode facilitates tracking materials of interest in genome-wide functional screens as well as the identification of drug targets or disease-associated genes [2]. Barcode-seq is used in several genome-wide screening tools, including shRNAs for gene knock-down [3], sgRNAs for genome editing [4] and barcoded yeast deletion strains in Saccharomyces cerevisiae and Schizosaccharomyces pombe. There are many steps during barcode-seq that can generate errors: chemical synthesis of oligonucleotides, PCR amplification and NGS to name a few [1]. It is necessary to consider all kinds of possible errors when performing barcode-seq experiments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.