Abstract

Inter-sample comparisons of T-cell receptor (TCR) repertoires are crucial for gaining a better understanding of the immunological states determined by different collections of T cells from different donor sites, cell types, and genetic and pathological backgrounds. For quantitative comparison, most previous studies utilized conventional methods in ecology, which focus on TCR sequences that overlap between pairwise samples. Some recent studies attempted another approach that is categorized into Poisson abundance models using the abundance distribution of observed TCR sequences. However, these methods ignore the details of the measured sequences and are consequently unable to identify sub-repertoires that might have important contributions to the observed inter-sample differences. Moreover, the sparsity of sequence data due to the huge diversity of repertoires hampers the performance of these methods, especially when few overlapping sequences exist. In this paper, we propose a new approach for REpertoire COmparison in Low Dimensions (RECOLD) based on TCR sequence information, which can estimate the low-dimensional structure by embedding the pairwise sequence dissimilarities in high-dimensional sequence space. The inter-sample differences between repertoires are then quantified by information-theoretic measures among the distributions of data estimated in the embedded space. Using datasets of mouse and human TCR repertoires, we demonstrate that RECOLD can accurately identify the inter-sample hierarchical structures, which have a good correspondence with our intuitive understanding about sample conditions. Moreover, for the dataset of transgenic mice that have strong restrictions on the diversity of their repertoires, our estimated inter-sample structure was consistent with the structure estimated by previous methods based on abundance or overlapping sequence information. For the dataset of human healthy donors and Sézary syndrome patients, our method also showed robust estimation performance even under the condition of high sparsity in TCR sequences, while previous studies failed to estimate the structure. In addition, we identified the sequences that contribute to the pairwise-sample differences between the repertoires with the different genetic backgrounds of mice. Such identification of the sequences contributing to variation in immune cell repertoires may provide substantial insight for the development of new immunotherapies and vaccines.

Highlights

  • The development of high-throughput sequencing with nextgeneration sequencers has provided new opportunities to quantify T-cell receptor (TCR) repertoires and to compare their differences among different cell types, organisms, and pathological samples

  • We here propose a new dimensionality-reduction-based method for REpertoire COmparison in Low Dimensions (RECOLD): we focus on the sequence information in all samples and estimate the low-dimensional representation by projecting and embedding the high-dimensional inter-sequence relations, calculated from pairwise sequence alignments, onto a low-dimensional space

  • We quantified the difference in TCR repertoires among different samples based on amino acid sequence dissimilarity

Read more

Summary

Introduction

The development of high-throughput sequencing with nextgeneration sequencers has provided new opportunities to quantify T-cell receptor (TCR) repertoires and to compare their differences among different cell types, organisms, and pathological samples. The detailed information of TCR repertoires, especially that of inter-sample differences, is anticipated to significantly promote the development of immunotherapies and vaccines [1, 2]. To this end, several statistical and computational methods have been proposed to quantify sample differences. Even if no overlapping among the sequences is detected, it is not possible to judge whether this occurs because the two repertoires cover quite different subspaces of the sequence space or because the repertoires cover the same subspace but show no overlapping by chance owing to the sparsity of the coverage

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call