Abstract

Determination of framework regions (FRs) and complementarity determining regions (CDRs) in an antibody is essential for understanding the underlying biology as well as antibody engineering and optimization. However, there are no computational algorithms available to delimit an antibody sequence or a library of sequences into FRs and CDRs in a coherent and automatic fashion. Based upon the mapping relationships among mature antibody sequences and their corresponding germline gene segments, a novel computational algorithm has been developed for automatic determination of CDRs. Even though a human can make more than 10 12 different antibody molecules in its preimmune repertoire to fight off invading pathogens, these antibodies are generated from rearrangements of a very limited number of germline variable (V) gene, diversity (D) gene and joining (J) gene segments followed by somatic hypermutation. The framework regions FR1, FR2 and FR3 in mature antibodies are encoded by germline V gene segments, while FR4 is encoded by J gene segments. Since there are only a limited number of germline gene segments, these genes can be pre-delimited to generate a knowledge base of FRs and CDRs. Then for a given antibody sequence, the algorithm scans each pre-delimited gene in knowledge base, finds the best matching V and J segments, and accordingly, identifies the FRs and CDRs. The described algorithm is stringently tested using nearly 25,000 human antibody sequences from NCBI, and it is proven to be very robust. Over 99.7% of antibody sequences can be delimited computationally. Of those delimited sequences, only 0.28% of them have somatic insertions and deletions in FRs, and their corresponding delimited results need manual checking. Another feature of the algorithm is that it is CDR definition independent, and can be easily extended to other CDR definitions besides the most widely used Kabat, Chothia and IMGT definitions. In addition to delimitation of antibody sequences into FRs and CDRs, the described algorithm is good for sequence annotation and sequence quality control by detecting unusual sequence patterns and features. Furthermore, it has been suggested that the algorithm may easily be embedded into other applications, such as to create a gene family specific PSSM (Position Specific Scoring Matrix) for antibody engineering, and to automatically number an antibody sequence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call