BAR-CAT: Targeted Recovery of Synthetic Genes via Barcode-Directed CRISPR-dCas9 Enrichment.
Modern gene synthesis platforms enable investigations of protein function and genome biology at an unprecedented scale. Yet, the proportion of error-free constructs in diverse gene libraries decreases with length due to the propagation of oligo synthesis errors. To rescue these error-free constructs, we developed Barcode-Assisted Retrieval CRISPR-Activated Targeting (BAR-CAT), an in vitro method that uses multiplexed dCas9-single-guide RNA (sgRNA) complexes to extract barcodes corresponding to error-free constructs. After a 15-min incubation and wash regimen, three low-bundance targets in a 300,000-member test library were enriched 600-fold, greatly reducing downstream requirements. When applied to a 384-gene DropSynth gene library, BAR-CAT enriched 12 targets up to 122-fold and revealed practical limits imposed by sgRNA competition and library complexity, which now guide ongoing protocol scaling. By eliminating laborious clone-by-clone validation and working directly on plasmid libraries, BAR-CAT provides a platform for recovering perfect synthetic genes, subsetting large libraries, and ultimately lowering the cost of functional genomics at scale.
- Research Article
- 10.1101/2025.06.27.658158
- Jun 30, 2025
- bioRxiv
Modern gene-synthesis platforms let us probe protein function and genome biology at unprecedented scale. Yet in large, diverse gene libraries the proportion of error-free constructs decreases with length due to the propagation of oligo synthesis errors. To rescue these rare, error-free molecules we developed BAR-CAT (Barcode-Assisted Retrieval CRISPR-Activated Targeting), an in-vitro enrichment method that couples unique PAM-adjacent 20-nt barcodes to each library member and uses multiplexed dCas9-sgRNA complexes to fish out the barcodes corresponding to perfect assemblies. After a single 15-min reaction and optimized wash regime (BAR-CAT v1.0), three low-abundance targets in a 300,000-member test library were enriched 600-fold, greatly reducing downstream requirements. When applied to 384x and 1,536x member DropSynth gene libraries, BAR-CAT retrieved up to 122-fold enrichment for 12 targets and revealed practical limits imposed by sgRNA competition and library complexity, which now guide ongoing protocol scaling. By eliminating laborious clone-by-clone validation and working directly on plasmid libraries, BAR-CAT provides a versatile platform for recovering perfect synthetic genes, subsetting large libraries, and ultimately lowering the cost of functional genomics at scale.
- Research Article
- 10.1111/1751-7915.14367
- Nov 16, 2023
- Microbial biotechnology
Large gene libraries are frequently created in Escherichia coli plasmids, which can induce cell toxicity and expression instability due to the high gene dosage. To address these limitations, gene libraries can be integrated in a single copy into the bacterial chromosome. Here, we describe an efficient system for the massive integration (MAIN) of large gene libraries in the E. coli chromosome that generates in-frame gene fusions that are expressed stably. MAIN uses a thermosensitive integrative plasmid that is linearized invivo to promote extensive integration of the gene library via homologous recombination. Positive and negative selections efficiently remove bacteria lacking gene integration in the target site. We tested MAIN with a library of 107 VHH genes that encode nanobodies (Nbs). The integration of VHH genes into a custom target locus of the E. coli chromosome enabled stable expression and surface display of the Nbs. Next-generation DNA sequencing confirmed that MAIN preserved the diversity of the gene library after integration. Finally, we screened the integrated library to select Nbs that bind a specific antigen using magnetic and fluorescence-activated cell sorting. This allowed us to identify Nbs binding the epidermal growth factor receptor that were not previously isolated in a similar screening of a multicopy plasmid library. Our results demonstrate that MAIN enables large gene library integration into the E. coli chromosome, creating stably expressed in-frame fusions for functional screening.
- Research Article
7
- 10.1371/journal.pone.0136778
- Sep 10, 2015
- PLOS ONE
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
- Research Article
- 10.64898/2026.01.12.699065
- Jan 13, 2026
- bioRxiv
High-throughput sequencing and computational protein design have created a growing gap between the discovery of new proteins and their functional characterization. In many instances, functional characterization requires one-to-one measurements—such as when detailed biochemical insights are desired or pooled selections are not possible—necessitating that individual variants be isolated and assayed. A major barrier to closing this gap is the cost to directly synthesize individual genes, which remains prohibitively expensive ($10–100 per sequence) and restricts these studies to small subsets of relevant variants, leaving many sequences without functional annotation. To address this, we developed user-defined Sorted Mutants (uSort-M), which combines pooled DNA synthesis, automated cell sorting of transformedEscherichia coli, and long-read sequencing to rapidly isolate and identify variants from diverse libraries. uSort-M can isolate, sequence, and validate individual variants from pooled libraries produced via diverse existing methods including multiplex assembly, error-prone PCR, or pooled site-directed mutagenesis. Sorting single bacterial clones into 384-well plates is efficient: eight plates (3,072 wells) can be filled in 1–2 hours, with up to 90% of wells yielding monoclonal cultures. Commercial long-read sequencing enables accessible, fast, and cost-effective identification of individual sequences from isolated clones while tolerating wide variation in fragment length and diversity across the library. Applying this workflow to a 328-member scanning mutagenesis library of a 300-bp gene recovered 96% of desired variants at fivefold lower cost than traditional synthesis. Numerical simulations identify key parameters governing library recovery and enable accurate prediction of the sampling effort required to achieve target coverage. As library size increases, this workflow offers substantial savings over traditional gene synthesis or cloning. Due to its generalizability, efficiency, and reliance on standard instrumentation, uSort-M removes a key barrier to large-scale protein functional characterization.
- Research Article
- 10.3791/66581
- May 17, 2024
- Journal of visualized experiments : JoVE
Functional genomics screening offers a powerful approach to probe gene function and relies on the construction of genome-wide plasmid libraries. Conventional approaches for plasmid library construction are time-consuming and laborious. Therefore, we recently developed a simple and efficient method, CRISPR-based modular assembly (CRISPRmass), for high-throughput construction of a genome-wide upstream activating sequence-complementary DNA/open reading frame (UAS-cDNA/ORF) plasmid library. Here, we present a protocol for CRISPRmass, taking as an example the construction of a GAL4/UAS-based UAS-cDNA/ORF plasmid library. The protocol includes massively parallel two-step test tube reactions followed by bacterial transformation. The first step is to linearize the existing complementary DNA (cDNA) or open reading frame (ORF) cDNA or ORF library plasmids by cutting the shared upstream vector sequences adjacent to the 5' end of cDNAs or ORFs using CRISPR/Cas9 together with single guide RNA (sgRNA), and the second step is to insert a UAS module into the linearized cDNA or ORF plasmids using a single step reaction. CRISPRmass allows the simple, fast, efficient, and cost-effective construction of various plasmid libraries. The UAS-cDNA/ORF plasmid library can be utilized for gain-of-function screening in cultured cells and for constructing a genome-wide transgenic UAS-cDNA/ORF library in Drosophila.
- Research Article
8
- 10.1371/journal.pone.0167634
- Dec 2, 2016
- PLoS ONE
K-shuff is a new algorithm for comparing the similarity of gene sequence libraries, providing measures of the structural and compositional diversity as well as the significance of the differences between these measures. Inspired by Ripley’s K-function for spatial point pattern analysis, the Intra K-function or IKF measures the structural diversity, including both the richness and overall similarity of the sequences, within a library. The Cross K-function or CKF measures the compositional diversity between gene libraries, reflecting both the number of OTUs shared as well as the overall similarity in OTUs. A Monte Carlo testing procedure then enables statistical evaluation of both the structural and compositional diversity between gene libraries. For 16S rRNA gene libraries from complex bacterial communities such as those found in seawater, salt marsh sediments, and soils, K-shuff yields reproducible estimates of structural and compositional diversity with libraries greater than 50 sequences. Similarly, for pyrosequencing libraries generated from a glacial retreat chronosequence and Illumina® libraries generated from US homes, K-shuff required >300 and 100 sequences per sample, respectively. Power analyses demonstrated that K-shuff is sensitive to small differences in Sanger or Illumina® libraries. This extra sensitivity of K-shuff enabled examination of compositional differences at much deeper taxonomic levels, such as within abundant OTUs. This is especially useful when comparing communities that are compositionally very similar but functionally different. K-shuff will therefore prove beneficial for conventional microbiome analysis as well as specific hypothesis testing.
- Research Article
1
- 10.2144/000114330
- Sep 1, 2015
- BioTechniques
Current gene synthesis methods often incorporate a PCR amplification step in order to yield final material sufficient for resolution from multiple off-products. These amplification steps can cause stochastic sampling effects that propagate errors in gene synthesis or decrease variability when applied to the construction of randomized libraries. We have developed a simple DNA polymerase-based gene synthesis reaction, polymerase step reaction (PSR), that assembles DNA oligonucleotides in a unidirectional fashion without the need for amplification. We demonstrate that PSR is efficient, with little off-product production, no detectable error propagation, and maximized variability in the synthesis of a phage display library.
- Research Article
9
- 10.1371/journal.pone.0119927
- Mar 19, 2015
- PLoS ONE
Our ability to engineer organisms with new biosynthetic pathways and genetic circuits is limited by the availability of protein characterization data and the cost of synthetic DNA. With new tools for reading and writing DNA, there are opportunities for scalable assays that more efficiently and cost effectively mine for biochemical protein characteristics. To that end, we have developed the Multiplex Library Synthesis and Expression Correction (MuLSEC) method for rapid assembly, error correction, and expression characterization of many genes as a pooled library. This methodology enables gene synthesis from microarray-synthesized oligonucleotide pools with a one-pot technique, eliminating the need for robotic liquid handling. Post assembly, the gene library is subjected to an ampicillin based quality control selection, which serves as both an error correction step and a selection for proteins that are properly expressed and folded in E. coli. Next generation sequencing of post selection DNA enables quantitative analysis of gene expression characteristics. We demonstrate the feasibility of this approach by building and testing over 90 genes for empirical evidence of soluble expression. This technique reduces the problem of part characterization to multiplex oligonucleotide synthesis and deep sequencing, two technologies under extensive development with projected cost reduction.
- Research Article
5
- 10.1007/978-1-4939-7060-5_7
- Jan 1, 2017
- Methods in molecular biology (Clifton, N.J.)
Gene synthesis is becoming an important tool in many fields of recombinant DNA technology, including recombinant protein production. De novo gene synthesis is quickly replacing the classical cloning and mutagenesis procedures and allows generating nucleic acids for which no template is available. Here, we describe a high-throughput platform to design and produce multiple synthetic genes (<500 bp) for recombinant expression in Escherichia coli. This pipeline includes an innovative codon optimization algorithm that designs DNA sequences to maximize heterologous protein production in different hosts. The platform is based on a simple gene synthesis method that uses a PCR-based protocol to assemble synthetic DNA from pools of overlapping oligonucleotides. This technology incorporates an accurate, automated and cost-effective ligase-independent cloning step to directly integrate the synthetic genes into an effective E. coli expression vector. High-throughput production of synthetic genes is of increasing relevance to allow exploring the biological function of the extensive genomic and meta-genomic information currently available from various sources.
- Research Article
51
- 10.1371/journal.ppat.1008344
- Mar 9, 2020
- PLOS Pathogens
A recent genome-wide screen identified ~300 essential or growth-supporting genes in the dental caries pathogen Streptococcus mutans. To be able to study these genes, we built a CRISPR interference tool around the Cas9 nuclease (Cas9Smu) encoded in the S. mutans UA159 genome. Using a xylose-inducible dead Cas9Smu with a constitutively active single-guide RNA (sgRNA), we observed titratable repression of GFP fluorescence that compared favorably to that of Streptococcus pyogenes dCas9 (Cas9Spy). We then investigated sgRNA specificity and proto-spacer adjacent motif (PAM) requirements. Interference by sgRNAs did not occur with double or triple base-pair mutations, or if single base-pair mutations were in the 3’ end of the sgRNA. Bioinformatic analysis of >450 S. mutans genomes allied with in vivo assays revealed a similar PAM recognition sequence as Cas9Spy. Next, we created a comprehensive library of sgRNA plasmids that were directed at essential and growth-supporting genes. We discovered growth defects for 77% of the CRISPRi strains expressing sgRNAs. Phenotypes of CRISPRi strains, across several biological pathways, were assessed using fluorescence microscopy. A variety of cell structure anomalies were observed, including segregational instability of the chromosome, enlarged cells, and ovococci-to-rod shape transitions. CRISPRi was also employed to observe how silencing of cell wall glycopolysaccharide biosynthesis (rhamnose-glucose polysaccharide, RGP) affected both cell division and pathogenesis in a wax worm model. The CRISPRi tool and sgRNA library are valuable resources for characterizing essential genes in S. mutans, some of which could prove to be promising therapeutic targets.
- Research Article
41
- 10.1093/protein/gzu029
- Aug 9, 2014
- Protein Engineering, Design and Selection
The de novo synthesis of genes is becoming increasingly common in synthetic biology studies. However, the inherent error rate (introduced by errors incurred during oligonucleotide synthesis) limits its use in synthesising protein libraries to only short genes. Here we introduce SpeedyGenes, a PCR-based method for the synthesis of diverse protein libraries that includes an error-correction procedure, enabling the efficient synthesis of large genes for use directly in functional screening. First, we demonstrate an accurate gene synthesis method by synthesising and directly screening (without pre-selection) a 747 bp gene for green fluorescent protein (yielding 85% fluorescent colonies) and a larger 1518 bp gene (a monoamine oxidase, producing 76% colonies with full catalytic activity, a 4-fold improvement over previous methods). Secondly, we show that SpeedyGenes can accommodate multiple and combinatorial variant sequences while maintaining efficient enzymatic error correction, which is particularly crucial for larger genes. In its first application for directed evolution, we demonstrate the use of SpeedyGenes in the synthesis and screening of large libraries of MAO-N variants. Using this method, libraries are synthesised, transformed and screened within 3 days. Importantly, as each mutation we introduce is controlled by the oligonucleotide sequence, SpeedyGenes enables the synthesis of large, diverse, yet controlled variant sequences for the purposes of directed evolution.
- Abstract
- 10.1016/s1525-0016(16)33350-0
- May 1, 2016
- Molecular Therapy
542. Novel Barcode-Based In Vivo Screening Method for Generating De Novo AAV Serotypes for CNS-Directed Gene Therapy
- Research Article
31
- 10.1186/s12864-019-5847-2
- Jul 1, 2019
- BMC Genomics
BackgroundMassively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of various DNA regulatory elements and their mutant variants. The assays are based on construction of highly diverse plasmid libraries containing two variable fragments, a region of interest (a sequence under study; ROI) and a barcode (BC) used to uniquely tag each ROI, which are separated by a constant spacer sequence. The sequences of BC–ROI combinations present in the libraries may be either known a priori or not. In the latter case, it is necessary to identify these combinations before performing functional experiments. Typically, this is done by PCR amplification of the BC–ROI regions with flanking primers, followed by next-generation sequencing (NGS) of the products. However, chimeric DNA molecules formed on templates with identical spacer fragment during the amplification process may substantially hamper the identification of genuine BC–ROI combinations, and as a result lower the performance of the assays.ResultsTo identify settings that minimize formation of chimeric products we tested a number of PCR amplification parameters, such as conventional and emulsion types of PCR, one- or two-round amplification strategies, amount of DNA template, number of PCR cycles, and the duration of the extension step. Using specific MPRA libraries as templates, we found that the two-round amplification of the BC–ROI regions with a very low initial template amount, an elongated extension step, and a specific number of PCR cycles result in as low as 0.30 and 0.32% of chimeric products for emulsion and conventional PCR approaches, respectively.ConclusionsWe have identified PCR parameters that ensure synthesis of specific (non-chimeric) products from highly diverse MPRA plasmid libraries. In addition, we found that there is a negligible difference in performance of emulsion and conventional PCR approaches performed with the identified settings.
- Research Article
13
- 10.1007/978-1-4939-6343-0_5
- Sep 27, 2016
- Methods in molecular biology (Clifton, N.J.)
Gene synthesis is a fundamental technology underpinning much research in the life sciences. In particular, synthetic biology and biotechnology utilize gene synthesis to assemble any desired DNA sequence, which can then be incorporated into novel parts and pathways. Here, we describe SpeedyGenes, a gene synthesis method that can assemble DNA sequences with greater fidelity (fewer errors) than existing methods, but that can also be used to encode extensive, statistically designed sequence variation at any position in the sequence to create diverse (but accurate) variant libraries. We summarize the integrated use of GeneGenie to design DNA and oligonucleotide sequences, followed by the procedure for assembling these accurately and efficiently using SpeedyGenes.
- Research Article
13
- 10.2144/03353st04
- Sep 1, 2003
- BioTechniques
The normalization and subtraction of highly expressed cDNAs from relatively large tissues before cloning dramatically enhanced the gene discovery by sequencing for the mouse full-length cDNA encyclopedia, but these methods have not been suitable for limited RNA materials. To normalize and subtract full-length cDNA libraries derived from limited quantities of total RNA, here we report a method to subtract plasmid libraries excised from size-unbiased amplified lambda phage cDNA libraries that avoids heavily biasing steps such as PCR and plasmid library amplification. The proportion of full-length cDNAs and the gene discovery rate are high, and library diversity can be validated by in silico randomization.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.