Abstract

Rationale: We have previously demonstrated that bioinformatics tools such as artificial neural networks (ANNs) are capable of performing pathogen‐, genome‐ and HLA‐wide predictions of peptide–HLA interactions. These tools may therefore enable a fast and rational approach to epitope identification and thereby assist in the development of vaccines and immunotherapy. A crucial step in the generation of such bioinformatics tools is the selection of data representing the event in question (in casu peptide–HLA interaction). This is particularly important when it is difficult and expensive to obtain data. Herein, we demonstrate the importance in selecting information‐rich data and we develop a computational method, query‐by‐committee, which can perform a global identification of such information‐rich data in an unbiased and automated manner. Furthermore, we demonstrate how this method can be applied to an efficient iterative development strategy for these bioinformatics tools. Methods: A large panel of binding affinities of peptides binding to HLA A*0204 was measured by a radioimmunoassay (RIA). This data was used to develop multiple first generation ANNs, which formed a virtual committee. This committee was used to screen (or ‘queried’) for peptides, where the ANNs agreed (‘low‐QBC’), or disagreed (‘high‐QBC’), on their HLA‐binding potential. Seventeen low‐QBC peptides and 17 high‐QBC peptides were synthesized and tested. The high‐ or low‐QBC data were added to the original data, and new high‐ or low‐QBC second generation ANNs were developed, respectively. This procedure was repeated 40 times. Results: The high‐QBC‐enriched ANN performed significantly better than the low‐QBC‐enriched ANN in 37 of the 40 tests. Conclusion: These results demonstrate that high‐QBC‐enriched networks perform better than low‐QBC‐enriched networks in selecting informative data for developing peptide–MHC‐binding predictors. This improvement in selecting data is not due to differences in network training performance but due to the difference in information content in the high‐QBC experiment and in the low‐QBC experiment. Finally, it should be noted that this strategy could be used in many contexts where generation of data is difficult and costly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call