Abstract Successful understanding and treatment of immune diseases spanning autoimmunity, cancer, and infectious disease relies upon understanding what is seen by the immune system. For T cell responses, this is controlled by MHCs, specialized polymorphic proteins that present a snapshot of the proteome of their parent cell or surroundings. Importantly, MHC polymorphisms cluster at their peptide-binding surface, where even single substitutions can greatly alter the peptide-MHC (pMHC) repertoire seen by T cells. Due to extreme person-to-person variability in MHC usage, data for peptide MHC specificity is largely limited to a handful of the most prevalent alleles. While algorithms that predict pMHC interactions perform well for these alleles, they can suffer from inaccuracy for less-studied alleles, due in part to biases and limited diversity in their training data. We present a platform that provides large unbiased data sets to improve pMHC repertoire definitions and predictions. Using yeast-displayed MHC molecules displaying a random peptide library, we identified hundreds of thousands of unique peptide binders for both well- and poorly-studied human MHC alleles. Our data demonstrates strong amino acid preferences at defined “anchor” positions for each allele, but also finds preferences at auxiliary sites that are not considered by most prediction software. We also find noncanonical amino acid preferences at defined anchor positions that confound current software, suggesting alternative peptide conformations. These data show the importance of unbiased pMHC repertoires to improve existing antigen prediction software, and suggest our approach can be used to define presently under-studied MHC alleles.
Read full abstract