Assessment of antibody library diversity through next generation sequencing and technical error compensation.

Marco Fantini,Ivan Arisi,Simonetta Lisi,Martina Goracci,Federico Cremisi,Marco Terrigno,Antonino Cattaneo,Luca Pandolfini,Michele Chirichella,Hikmet Budak

doi:10.1371/journal.pone.0177574

Marco Fantini, Ivan Arisi + Show 8 more

Open Access

https://doi.org/10.1371/journal.pone.0177574

Copy DOI

Abstract

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.

Highlights

2 scFv libraries and a single domain library, were created from cDNA derived from human lymphocytes RNAs and amplified in bacteria
The hVH instead was sequenced to calculate the complexity of a single domain library and to demonstrate the advantage of single domain sequencing
Assuming that each transformed bacterium takes one copy of plasmid DNA, we can define the first hard cap of the library complexity as the number of total transformants obtained determined through CFU count

Summary

Introduction

Despite the simplicity and the importance of this concept, until recently, measuring the diversity of antibody repertoires in a reliable and quantitative way was not possible and was approximated to the transformation efficiency of bacteria used to amplify the library [18,20,21]. To corroborate this estimate, so far the standard procedure in the literature consisted in testing the fingerprint pattern or the sequencing data of a few hundred library members for the presence of duplicates [14,22]. The 10000th element does not have the same probability to be unique as the 100th element

Methods

Results

Conclusion