Abstract
Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences for stimulation via vaccination. Classically, these models are defined in terms of a probabilistic V(D)J recombination model which is sometimes combined with a selection model. In this paper we take a different approach, fitting variational autoencoder (VAE) models parameterized by deep neural networks to T cell receptor (TCR) repertoires. We show that simple VAE models can perform accurate cohort frequency estimation, learn the rules of VDJ recombination, and generalize well to unseen sequences. Further, we demonstrate that VAE-like models can distinguish between real sequences and sequences generated according to a recombination-selection model, and that many characteristics of VAE-generated sequences are similar to those of real sequences.
Highlights
T cell receptors (TCRs) are composed of an a and a b protein chain, both originating from a random V(D)J recombination process, followed by selective steps that ensure functionality and limit autoreactivity
We model TCR sequences using simple variants of variational autoencoders (VAEs)
VAE models can be described as consisting of an n-dimensional latent space, a prior pðzÞ on that latent space, and probabilistic maps parameterized by two neural networks: an encoder qfðzjxÞ and a decoder pðx^jzÞ (Figure 1; Kingma et al, 2014b)
Summary
T cell receptors (TCRs) are composed of an a and a b protein chain, both originating from a random V(D)J recombination process, followed by selective steps that ensure functionality and limit autoreactivity. To generate diverse and functional TCRs, T cells combine a stochastic process for choosing from a pool of V, D and J genes with a process for selecting for expression and MHC recognition. The resulting ensemble of protein sequences summarizes each individual’s previous immune exposures and largely determines their resistance to various infections. One can consider these protein sequences as a sample from a probability distribution, whether it is the distribution of receptors within an individual, or the distribution of receptors in a population. This article concerns fitting such probability distributions on TCR b protein sequences (which will be called ‘TCR sequences’ for the rest of the paper)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.