Fast clonal family inference from large-scale B cell repertoire sequencing data

Kaixuan Wang,Xihao Hu,Jian Zhang

doi:10.1016/j.crmeth.2023.100601

Kaixuan Wang, Xihao Hu + Show 1 more

Open Access

https://doi.org/10.1016/j.crmeth.2023.100601

Copy DOI

Abstract

Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response.

Full Text