Abstract

Abstract Background Increased hydrophobicity of the contact regions between MHC class I-restricted peptides and TCR is associated with increased immunogenicity. To improve prediction of antigenic peptides and to identify additional biochemical properties associated with immunogenicity, we have created a random forest (RF)-based computational model that evaluates target epitopes using 66 parameters that cover a range of amino acid biochemical properties. Method A dataset of publicly available 9 mer epitopes with experimentally defined immunogenicity (n=29,874) and/or HLA binding (n=108,069) were collected from IEDB and the MassIVE database. Epitopes restricted to target HLA alleles were scored based on the biochemical parameters of each amino acid, and used to train a RF model specific to that HLA. For biochemical property analysis, redundant features were removed with a wrapper algorithm before being used to train the RF model. Results The RF model specific for HLA-A*02:01 (n=5500) was evaluated for HLA binding and immunogenicity prediction accuracy using a mutually-exclusive testing set of HLA-restricted peptides that were non-immunogenic, immunogenic, or non-binders. The model showed an immunogenicity prediction accuracy of 83% with an AUC of 0.88, and a HLA binding prediction accuracy of 97.4%, with a sensitivity of 95.2% and specificity of 99.9%. The RF model confirmed that hydrophobic interactions at peptide positions 4 and 8 as well as H-bonding at position 1 were important for immunogenicity. Conclusion RF-based algorithms can be used to identify biochemical parameters associated with immunogenic class I-restricted T cell epitopes. This model is being expanded to other HLA alleles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call