Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Immunoglobulin loci-transgenic animals are widely used in antibody discovery and increasingly in vaccine response modelling. In this study, we phenotypically characterised B-cell populations from the Intelliselect Transgenic mouse (Kymouse) demonstrating full B-cell development competence. Comparison of the naïve B-cell receptor (BCR) repertoires of Kymice BCRs, naïve human, and murine BCR repertoires revealed key differences in germline gene usage and junctional diversification. These differences result in Kymice having CDRH3 length and diversity intermediate between mice and humans. To compare the structural space explored by CDRH3s in each species’ repertoire, we used computational structure prediction to show that Kymouse naïve BCR repertoires are more human-like than mouse-like in their predicted distribution of CDRH3 shape. Our combined sequence and structural analysis indicates that the naïve Kymouse BCR repertoire is diverse with key similarities to human repertoires, while immunophenotyping confirms that selected naïve B cells are able to go through complete development. Editor's evaluation This is an important study that defines the limits of using human Ig sequences in genetically modified mice to explore human immune responses. The study was carefully designed, and the results should be of interest to a wide readership and will be important for grounding future studies/uses of the Ky mouse. https://doi.org/10.7554/eLife.81629.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Twenty-five years of progress in genetic engineering from the first immunoglobulin (Ig) transgenic mouse (Brüggemann et al., 1989) culminated in 2014 in the integration of a complete human Ig haplotype in mice for the first time (Lee et al., 2014). Humanised Ig loci-transgenic animal models have proven extremely useful in therapeutic antibody discovery; 20 of the 127 therapeutic antibodies licensed in the US or EU as of April 2022 were derived from transgenic mouse platforms (data from Thera-SAbDab; Raybould et al., 2020). Transgenic platforms have also found a new application in vaccine response modelling (Sok et al., 2016; Pantophlet et al., 2017; Walls et al., 2020). As humanised animal models become the source of a growing number of therapeutics and play an increasingly important role in the evaluation of novel vaccine candidates, it is crucial to understand the degree to which their B-cell repertoires can be considered representative of humans. Contemporary Ig transgenic animal models vary according to the number of genes and localisation of the inserted human Ig loci (Green, 2014; Brüggemann et al., 2015). In Kymab’s Intelliselect Transgenic mouse (Kymouse), a complete set of human variable (V), diversity (D), and junction (J) genes of the IGH locus as well as the V and J genes of the Igλ and Igκ loci were inserted at the sites of the endogenous mouse loci. The mouse constant regions were retained, preserving downstream interactions with endogenous intracellular signalling components and cell membrane Fc receptors, resulting in functional, fully active chimeric antibodies. Kymice exhibit normal B-cell production and maturation and the resulting B-cell receptors (BCRs) are diverse, with human-like CDRH3 lengths and evidence of somatic hypermutation (Lee et al., 2014). However, the baseline phenotypic diversity in B cells and BCRs in the Kymouse has not been fully described. B cells are an integral part of the humoral immune response due to their ability to produce antibodies against diverse antigens, providing protection against infection. B cells originate from hematopoietic stem cells in the bone marrow, where they undergo several phases of antigen-independent development leading to the generation of immature B cells. B cells are routinely classified based on their maturation status, antibody isotype, and effector function. Ig gene rearrangement during these early stages of B-cell development results in the expression of a mature BCR that is capable of binding to antigen. This is followed by positive and negative selection processes, to eliminate non-functional and self-reactive immature B cells. Surviving B cells complete antigen-independent maturation in the spleen, producing immunocompetent naïve mature B cells that subsequently develop into either follicular or marginal zone B cells. In response to vaccination or invading microbes, antigen-specific B cells within secondary lymphoid organs differentiate into antibody-producing cells, early memory cells, or rapidly proliferate and form structures known as germinal centres (GCs) (Allen et al., 2007). GCs are inducible lymphoid microenvironments that support the generation of affinity-matured, isotype-switched memory B cells and antibody-secreting plasma cells. Long-lived plasma cells secrete high-affinity antibodies, and memory cells can readily elicit an efficient antibody immune response upon re-exposure to the immune stimuli (Corcoran and Tarlinton, 2016; Weisel and Shlomchik, 2017). Iterative cycles of B-cell hypermutation and selection within the GC leads to an accumulation of affinity-enhancing mutations and ultimately to the progressive increase of serum antibody affinity, a process known as antibody affinity maturation (Jacob et al., 1991). Antibody-secreting plasma cells play critical roles in protective immunity on the one hand and antibody-mediated autoimmune disease on the other. During immune responses a small fraction of newly generated plasma cells enter either the bone marrow or the lamina propria of the small intestine where they populate specialised survival niches and become long-lived plasma cells (Lemke et al., 2016) thus maintaining antibody titres for extended periods. The variable domain of a BCR is composed of a heavy chain and a light chain. Each of the chains in the antibody has three hypervariable regions known as the complementarity-determining regions (CDRs), which make most contacts with the antigen. The heavy chain locus consists of variable (V), diversity (D), and joining (J) gene segments, which recombine to form the variable domain of the heavy chain (VH). These genes are referred to as the IGHV, IGHD, and IGHJ genes, respectively. The first two CDRs of the heavy chain, CDRH1 and CDRH2, are encoded by the IGHV gene alone, while the third and most variable CDR, CDRH3, spans the IGHV, IGHD, and IGHJ gene junctions. The insertion of random and palindromic nucleotides at the VD and DJ junctions further contributes to the diversity of the CDRH3, ensuring binding diversity to different antigens and epitopes (Xu and Davis, 2000). Each of the light chain loci, kappa and lambda, consist of V and J gene segments but no D gene segments, and both the germline and the recombined light chain variable region (VL) are less diverse than their heavy chain counterparts (Collins and Watson, 2018). These genes are referred to as the IGKV and IGKJ or IGLV and IGLJ genes for the kappa and lambda chains, respectively: we use IGKLV or IGKLJ to refer to the V or J genes of either light chain locus collectively. Due to the greater diversity of the heavy chain, most next-generation sequencing (NGS) of BCR repertoires (BCR-seq) has focused on the heavy chain; lower throughput methods exist for identifying the light chain pairing (Curtis and Lee, 2020). The resulting BCR sequences can be aligned to reference germline gene databases to infer most likely germline gene origins and insertions or deletions at the V(D) or (D)J junctions (Ye et al., 2013). Alignment of BCRs to common germline genes also allows inference about clonal structure, as sequences sharing common germline gene assignments as well as homology in the CDRH3 loop may be inferred to have arisen from a common progenitor B cell (Greiff et al., 2015; Yaari and Kleinstein, 2015). The amino acid sequences of these heavy chains can also be functionally examined through annotation with structural tools (Kovaltsuk et al., 2017; Marks and Deane, 2020). Changes in the pattern of CDRH3 shapes in BCR repertoires have been observed along the B-cell differentiation axis in both humans and mice (Kovaltsuk et al., 2020) but the extent to which the CDRH3 shape differs between humans and mice has not been explored. Here, we have characterised the frequency of GC B cells, memory B cells, and long-lived plasma cells from spleens, lymph nodes, and bone marrows of antigen-inexperienced Kymice (Lee et al., 2014). The frequencies of these B-cell subsets as well as the breadth and nature of their BCR repertoires constitute the first step in our understanding of how the immune system of this model organism responds to different antigens, vaccines, and pathogens that are both of scientific and of therapeutic interest. Examining the nature of the naïve BCR repertoire in Kymice through both single-cell and bulk sequencing and structural analysis shows that the Kymouse naïve BCR repertoires are more human-like in their distribution of CDRH3 shapes. Results Antigen naïve Kymice exhibit similar B-cell subpopulation frequencies We characterised the B-cell subpopulations within spleen and lymph node samples of 12 antigen-inexperienced Kymice using an 11-colour flow cytometry panel that incorporated a range of B-cell lineage markers to identify both murine memory and GC B-cell populations. A canonical gating scheme organises B cells by their maturation status – from transitional B cells through naïve, non-switched and ultimately class-switched memory B cells. To look at the heterogeneity of the B-cell subpopulations in more detail, we incorporated unbiased Leiden clustering on the multi-parameter fluorescently activated cell sorting (FACS) data. Sorted cells separated into two large clusters, B cells and non-B cells (Figure 1A). As expected, within the B-cell population immature isotypes (IgD and IgM) were enriched in naïve cells, whereas markers CD95 and GL7 were enriched in GC cells (Figure 1C). The murine memory B-cell population has been described to comprise five subpopulations defined by the progressive transition from naive-like to more memory-like cells and the surface markers CD80, CD73, and PD-L2 have previously been reported to enable their distinction (Tomayko et al., 2010). Using a low dimensional UMAP representation, we observed distinct staining patterns of these markers in the memory B-cell compartment and were able to distinguish between 12 major B-cell populations, including transitional, naïve, and activated as well as six distinct memory subsets, defined as (1) PD-L2hi, (2) CD73hi CD80hi PD-L2low, (3) CD80low, (4) PD-L2hi CD80hi CD73low, (5) CD80hi, and (6) CD73hi (Figure 1A). GC B cells formed a small and well-separated cluster whose small frequency was not surprising given that these were antigen naïve animals. Based on the three memory markers (CD80, CD73, and PD-L2) the relative frequency of the total memory B cells was 6.60% ± 2.51%, and the frequency of CD95 and GL7 positive GC cells was 0.18% ± 0.26%. The median expression profile of each subset is shown as a heatmap (Figure 1B). The antigen naïve B-cell populations in un-immunised and non-infected Kymice are therefore normal and consistent between different Kymice (Figure 1B, right panel). Figure 1 Download asset Open asset UMAP projections of sorted cell populations from the spleen, lymph nodes and bone marrow of Kymice. UMAP projections of sorted cell populations identified using unsupervised clustering from spleen and lymph nodes (A) or from bone marrow samples (C) can be used to visualise marker expression on the combined cells that were used for sorting and to characterise their phenotypes. UMAP projections show a clear separation between B cells and non-B cells for both sample types. The projections are coloured by the 12 resolved cell types in the spleen and lymph nodes (A) and the six resolved in the bone marrow samples (C). Normalised and scaled marker expression and frequencies were used to visualise mouse-to-mouse variation for each of the resolved cell types in the spleen and lymph nodes (B) or the bone marrow samples (D). The expression profiles are homogeneous across mice. In spleens and lymph nodes non-class-switched IgD+ naive B cells were the predominant cell population, followed by non-B cells and IgM+ naive B cells, reflective of a tissue that has not been exposed to antigen. In bone marrow samples on the other hand the most numerous populations were non-B cells, followed closely by class-switched IgG+ B cells, a result that reflects a tissue niche that supports survival of long-lived antibody producing cells. B cells in the bone marrow are class switched with variable levels of surface BCR expression To understand the B-cell profile beyond spleen and lymph nodes, we also profiled the bone marrow cells of mice. We characterised bone marrow samples using a nine-colour flow cytometry panel that incorporated a range of B-cell lineage markers. The staining panel was designed to identify plasma cells as well as class-switched B-cell subsets in antigen-inexperienced mice. As expected, we saw that the cells separated again into two large clusters, B cells and non-B cells (Figure 1C). The expression profiles of the subsets were again plotted as heatmaps showing the median expression profiles of each subset (Figure 1D). Within the B-cell cluster, we identified several discrete subsets marked by the expression of different BCR isotypes. B cells were clustered into five distinct subpopulations, including immature IgD+ and IgM+ cells, mature IgM+ and IgG+ B cells, and plasma cells. The markers TACI and Sca-1 were enriched in plasma cells as expected, whereas CD138, a common plasma cells marker did not show a high level of separation between the different cell types. Unsurprisingly, we saw the biggest separation between B cells and non-B cells, and a continuum of B-cell subtypes from IgD, through IgM, to IgG-expressing B cells as well as a discrete cluster identified as plasma cells. The frequencies of the plasma cells were low (0.90% ± 0.28%) in comparison to other B-cell subtypes, perhaps not surprising given that these were antigen naïve animals. The Kymouse naïve antibody sequence repertoire is more human-like than murine-like Using high-throughput paired sequencing we recovered 3175 full-length paired IgM VH and VL sequences and a further 451,655 full-length unpaired IgM VH sequences from naïve B cells extracted from the spleens and lymph nodes of 22 Kymice. In order to evaluate the humanness of the Kymouse naïve B-cell sequence repertoire, we performed two comparisons: a comparison of the lower-depth single-cell paired sequencing data with a published, high-depth paired human naïve dataset, and a more extensive comparison of the bulk VH sequences to equivalent datasets of 338,677 VH sequences from human naive B cells, and 268,285 VH sequences from C57BL/6 mice. Paired VH and VL sequencing suggests that Kymice produce primary repertoires with differing germline gene usages than humans One of the most pronounced differences in heavy/light chain pairing between wild-type mice and humans that has been described is the usage ratio of the Igκ and Igλ chains in the BCRs of circulating B cells. Humans have an Igκ/Igλ ratio of approximately 60:40 in serum and in mature B cells: the Igκ/Igλ ratio in the human naïve single-cell dataset was 62:38 (IQR: 64:36 59:41). Mice have an Igκ/Igλ ratio of 95:5 in serum and 90:10 on B cells (McGuire and Vitetta, 1981). We used the 3175 paired VH and VL sequences to calculate the Igκ/Igλ ratio in Kymice and found a ratio of 51:49 (IQR: 55:45, 47:53), which is considerably closer to the human ratio, as both reported in the literature and measured in the human paired dataset, than the mouse ratio. We next analysed the heavy and light chain gene usage in the paired data and compared the observed frequencies with those observed in the large paired human dataset (Figure 2). There are differences in usage of all gene segments on both chains: IGHV genes of subgroup IGHV3 and IGHV6 are expressed at a higher rate in Kymouse vs human repertoires (an average of 66.4% vs 44.8% and 4.0% vs 0.7% of Kymouse and human repertoires, respectively), with a decreased usage of IGHV1 and IGHV4 genes relative to human repertoires (an average of 10.5% vs 19.4% and 15.8% vs 26.1% of Kymouse and human repertoires, respectively) (Figure 2A). The IGHD gene subgroup usage was likewise different with increased usage of IGHD1, IGHD6, and IGHD7 in the Kymouse, and a significantly reduced usage of IGHD2, IGHD4, and IGHD5 (Figure 2B). The Kymice use IGHJ6 at a significantly higher frequency (on average 39.3% in Kymice vs 24.5% in human repertoires), with reduced usage of IGHJ3, IGHJ4, and IGHJ5 (13.5% vs 8.5%, 45.2% vs 39.8%, and 12.8% vs 9.5% in human vs Kymouse repertoires) in Kymice (Figure 2C). Figure 2 with 2 supplements see all Download asset Open asset Single-cell sequencing of Kymouse BCR repertoires reveals significant differences in encoding gene frequencies in comparison to human repertoires. The non-mutated, naïve IgM subset of the paired VH and VL human and Kymouse sequences differ significantly in the frequencies of their encoding genes, as well as in their CDRH3 length, IGHD gene alignment length, and the length of their VD and DJ insertions. (A) shows the significant differences in the Kymouse’s usage of IGHV gene subgroups with most notably significantly greater IGHV3 usage. IGHV3 comprised on average 66.4% of Kymouse repertoires vs 44.8% of human repertoires, and significantly lower IGHV1 usage (19.4% of human repertoires vs 10.5% of Kymouse repertoires). (B) likewise shows a number of differentially expressed gene subgroups: the largest differences are in IGHD1 (24.7% of Kymouse repertoires vs 9.3% of human repertoires on average), IGHD2 (16.1% of human repertoires vs 5.2% of Kymouse repertoires), and IGHD5 (9.6% vs 4.0% of Kymouse repertoires) and IGHD7 (28.6% of Kymouse repertoires vs. 20.4% of human). There are other notable differences in IGHJ gene usage (C) where there is significantly greater IGHJ6 usage in Kymice (39.3% vs 24.6% in humans) and significantly less IGHJ3 usage (8.5% vs 13.5% in humans). There are also differences in the genes encoding the light chain, the IGKLV (D) and IGKLJ (E) genes, such as significantly greater use of IGKV3 by human repertoires (21.9% of human repertoires vs 9.5% of Kymouse repertoires) and significantly greater use of IGLV2 by Kymouse repertoires (13.5% vs 8.0% of human repertoires on average). (E) shows significantly greater IGKJ2 usage by human repertoires (16.3% vs 5.4% in Kymouse) and significantly greater IGLJ3 usage by Kymouse repertoires (19.7% vs 12.4% in humans). (F) displays the distribution of CDRH3 length (which differs significantly with humans having CDRH3s on average 1.9aa shorter), CDRL3 length (no significant difference), and IGHD germline alignment length, which differs by 0.2aa on average, as well as the distribution of VD and DJ insertion lengths, which differ significantly and by nearly a factor of 2 (1.9× as many insertions on average in the VD junction, and 1.97 as many at the DJ junction). The genes encoding the VL likewise show a difference at the level of IGK/LV and IGK/LJ gene subgroup usage (Figure 2D and E) as is expected given the ~10% greater proportion of lambda chains found in Kymouse repertoires. Such differences in gene segment frequency persist when comparing frequencies within kappa and lambda chains (Figure 2—figure supplement 1). Within the kappa repertoire, there is significantly greater usage of IGKV1, IGKV2, and IGKV5 vs. IGKV3 in human repertoires (53.7% vs 46.7%, 16.8% vs 8.9%, 2.4% vs 0.1%, and 18.9% vs 35.3% in Kymice vs humans on average), as well as significantly greater IGKJ3, IGKJ4, and IGKJ5 usage and significantly reduced IGKJ2 usage (15.0% vs 10.7%, 29.3% vs 23.1%, 15.4% vs 8.6%, and 10.8% vs 26.2%). Within the lambda repertoire, Kymice use significantly less IGLV1 (20.1% vs 32.0%) and exhibited an increased usage of IGLV2, IGLV4, IGLV5, IGLV7, and IGLV9 (27.0% vs 20.9%, 3.2% vs 1.9%, 2.7% vs 0.8%, 3.6% vs 1.3%, and 3.0% vs 0.5%). There is also a significant reduction in the usage of IGLJ1 in Kymice vs. humans (8.4% vs 16.2%) and an increase in IGLJ3 usage (39.5% vs 31.1%). At the level of gene subgroups, the Kymouse repertoires use a significantly more diverse set of light chain genes than do the human repertoires, when subsampling repertoires to the minimal sample size of 105 sequences (Figure 2—figure supplement 2). The Kymouse repertoires also use a significantly greater number of IGHD genes, but a significantly reduced number of IGHV genes. IGHV subgroup diversity is lower on average in Kymice but not significantly so. The reduced IGHV gene diversity and increased IGKLV gene diversity of the Kymouse result in comparable combinatorial diversity of these genes (Figure 2—figure supplement 2). In order to elucidate whether the Kymice and human paired repertoires could be separated on the basis of their gene usage, we performed repeated random subsampling to the minimum repertoire size (105 sequences), calculated Z-normalised gene frequencies per repertoire, performed hierarchical clustering and calculated the adjusted Rand index. The adjusted Rand index is a measure of how well the human and Kymouse repertoires could be clustered. A value of 1.0 indicates a perfect clustering. We found that the clearest separation could be achieved for IGK/LV gene usage, where Kymice and human repertoires could be perfectly separated in 96% of repeats. IGHD gene was the next best separator, clustering the repertoires separately in 62% of repeats. IGHV gene subgroup clustered repertoires separately in 26% of repeats. IGHV gene, IGK/LV gene subgroup, IGK/LJ, and IGHJ gene clustered repertoires separately in 6%, 5%, 4%, and 0% of repeats separately. We further examined the distributions of CDRH3 and CDRL3 length (Figure 2F). We found that human CDRH3s were on average longer than Kymouse CDRH3s (16.4aa ± 0.01 vs 14.5aa ± 0.13); there was no significant difference in CDRL3 length (average 9.8aa ± 0.002 vs 9.7aa ± 0.2). To identify possible causes for a difference in CDRH3 length, we compared IGHD gene alignment length and length of nucleotide insertions at the VD and DJ junctions. While there was a small but significant (5% level) difference in IGHD gene alignment length (average 4.6aa ± 0.06 vs 4.4aa ± 0.08), the largest difference is in insertion length at both the VD and DJ junctions (average 7.8nt ± 0.01 vs 4.1nt ± 0.1, 7.3nt ±0.02 vs 3.7nt ± 0.1). A significantly greater proportion of the Kymouse primary repertoire does not have junctional insertions compared to the human repertoire: 19.1% of the Kymouse sequences had no VD insertions, while 8.2% of the human sequences lacked VD insertions; 14.1% of Kymouse sequences had no DJ insertions while 4.5% of human sequences lacked these insertions. Over 10 times as many of the sequences in the Kymouse repertoires lacked junctional insertions compared to human repertoires (3.8% in Kymice vs. 0.3% in humans). Higher-depth bulk VH sequencing reveals differences in germline gene usage consistent with paired sequencing, and allows examination of CDRH3 diversity While the single-cell data is a good measure of summary statistics such as gene usage frequencies, it is not sufficient to provide insight into features such as diversity within repertoires and overlap among repertoires, either at the level of clones or structurally. Bulk VH sequencing was used to increase the depth of our repertoire sampling by over 200-fold (from on average 218.4 heavy chain clonotypes per sample to an average of 53,006.2 per sample in the bulk VH experiments). We used the bulk VH data to re-calculate gene usage frequencies in Kymice at a higher depth, and compare the usage frequency of the IGHV, IGHD, and IGHJ germline genes to those in bulk heavy chain human data of comparable depth. At such enhanced depth, all IGHD and IGHJ genes in the Kymouse haplotype were observed in all repertoires, as were all non-orphon IGHD and IGHJ genes in the general human database. There were 13 IGHV genes found in human repertoires that are absent in the Kymouse haplotype which were observed in some but not all human repertoires (Figure 3—figure supplement 1). Statistical testing on a gene-by-gene basis was largely in agreement with the findings from the single-cell analysis (Figure 3—figure supplement 2) with the exception that IGHJ4 usage was elevated in the bulk data relative to the single-cell data, and IGHJ6 usage lower such that it is not significantly different from human repertoires (Figure 3—figure supplement 3). In addition to statistical testing on a gene-by-gene basis (Figure 3—figure supplement 2), we used hierarchical clustering to compare the gene usage profiles of individual Kymice and humans, building dendrograms to show the relationships between the individuals’ gene usage profiles. The frequencies as determined by sequence abundance are shown to differ in such a way that the two repertoire types can be clustered separately. The hierarchical clustering of the IGHV genes showed that the Kymice and humans form nearly separate monophyletic clusters except for a single outlier human subject (Figure 3A). Most of the variation in IGHV gene usage is explained by the IGHV gene subgroup usage: clustering by IGHV gene subgroup usage separates humans and Kymice without the outlier human sample, with Kymice using a lower proportion of IGHV1 and IGHV2 genes (5.3% vs 22.6% and 0.2% vs 2.1% in Kymouse vs human repertoires respectively for these subgroups), and an increased IGHV3 (51.6% vs 39.7%), IGHV4 (35.3% vs 31.6%), and IGHV6 usage (4.7% vs 1.0%) (Figure 3B). Figure 3 with 3 supplements see all Download asset Open asset Bulk VH repertoires of Kymice exhibit differences in encoding gene frequencies versus human repertoires. Gene usage clustermaps for (A) IGHV genes, (B) IGHV subgroups, (C) IGHJ genes, and (D) IGHD genes from bulk VH sequencing reveal differences in gene frequencies that are sufficient to cluster most Kymouse and human repertoires. The IGHV clustermaps show a separation between human (blue) and Kymouse (black) repertoires, with lower usage of IGHV1 and IGHV2 in the Kymouse (5.3% vs 22.6% and 0.2% vs 2.1% in Kymice and humans, respectively) and increased usage of IGHV3 (51.6% vs 39.7%), IGHV4 (35.3% vs 31.6%), and IGHV6 (4.7% vs 1.0%). There are also differences in usage of IGHJ genes with a preference in the Kymouse repertoires for IGHJ4 (46.7% vs 42.6%). The IGHD gene usage shows the clearest distinction between Kymice and human repertoires with greater usage of IGHD1, IGHD6, and IGHD7 (24.2% vs 10.3%, 27.4% vs 20.4%, and 2.4% vs 0.6% on average in Kymice and humans, respectively) and lower usage of IGHD, IGHD4, and IGHD5 genes (17.2% vs 5.8%, 9.1% vs 5.5%, and 9.5% vs 4.8%) (Figure 3—figure supplement 2). The IGHJ gene usage profile is similar: Kymice and nine out of the ten humans form monophyletic clades with a single outlier human. On average, the Kymouse uses IGHJ4 more frequently than humans (46.7% vs 42.6%), and IGHJ5 and IGHJ1 less frequently (10.1% vs 13.5%, and 0.6% vs 1.0%) (Figure 3C and Figure 3—figure supplement 3). Both the IGHV and IGHJ gene usage profiles of naïve Kymouse repertoires are more similar to one another than to any human repertoire. The IGHD gene usage is likewise distinguishable between humans and Kymice on average (Figure 3D). As can be seen from the heatmap, the IGHD germline genes used by the Kymice (e.g. IGHD3-22, IGHD2-15) are infrequently used by humans and vice versa. IGHD2, IGHD4, and IGHD5 subgroups are preferred by human repertoires (average 17.2% vs 5.8%, 9.1% vs 5.5%, and 9.5% vs 4.8% respectively in human vs Kymouse repertoires), while Kymouse repertoires preferentially use IGHD1, IGHD6, and IGHD7 (24.2% vs 10.3%, 27.4% vs 20.4%, and 2.4% vs 0.6% on average in Kymice and humans, respectively) (Figure 3—figure supplement 2). With the greater depth afforded by bulk VH sequencing, we were able to analyse further the differences in CDRH3 length we noted in the single-cell analysis. In addition to our comparison with naïve human repertoires, we compared the Kymouse repertoires with equivalent repertoires of C57BL/6 mice (non-mutated IgM sequences from naïve B cells). We compared the distribution of the CDRH3 lengths in each species’ naïve repertoire (Figure 4A). This revealed that Kymice have an average CDRH3 length in between that of humans and mice, with a mean CDRH3 length of 14.3 amino acids. In comparison, the C57BL/6 mouse dataset has a mean length of 12.4 amino acids, and the human dataset has a mean CDRH3 length of 16.6 amino acids. Kymouse CDRH3 loops are on average 2.36aa shorter than humans (95% CI: 2.26, 2.48; p<0.001), while C57BL/6 mice CDRH3 loops are on average 4.21aa shorter than humans (95% CI: 4.12, 4.30; p<0.001). The intra-species variance in CDRH3 length is small in comparison to the inter-species difference defined above (Figure 4—figure supplement 1). Figure 4 with 2 supplements see all Download asset Open asset The CDRH3 length distribution of the Kymouse is intermediate between humans and C56BL/6 mice primarily due to reduced VD and DJ insertion rates. The CDRH3 length distribution of the Kymouse (average 14.3aa) is intermediate between equivalent C57BL/6 repertoires (12.4aa) and human repertoires (16.6aa) (A). For each of five possible contributing factors, we us

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call