Objective: To establish prediction models for human leukocyte antigen (HLA) haplotypes and HLA genotypes, and verify the prediction accuracy. Methods: The prediction models were established based on the characteristic of HLA haplotype inheritance and linkage disequilibrium (LD), as well as the invention patents and software copyrights obtained. The models include algorithm and reference databases such as HLA A-C-B-DRB1-DQB1 high-resolution haplotypes database, B-C and DRB1-DQB1 LD database, G group alleles table, and NMDP Code alleles table. The prediction algorithm involves data processing, comparison with reference data, filtering results, probability calculation and ranking, confidence degree estimation, and output of prediction results. The accuracy of the predictions was verified by comparing them with the correct results, and the relationship between prediction accuracy and the probability distribution and confidence degree of the predicted results was analyzed. Results: The HLA haplotypes and genotypes prediction models were established. The prediction algorithm included the prediction of A-C-B-DRB1-DQB1 haplotypes according to HLA-A, B, DRB1, C, DQB1 genotypes, the prediction of C and DQB1 high-resolution results according to A, B and DRB1 high-resolution results, and the prediction of A, B, DRB1, C and DQB1 high resolution results according to the A, B and DRB1 intermediate or low resolution results. Validation results of "Predicting A-C-B-DRB1-DQB1 haplotypes basing on HLA-A, B, DRB1, C, DQB1 genotypes" model: for 787 data, the accuracy was 94.0% (740/787) with 740 correct predictions, 34 incorrect predictions, and 13 instances with no predicted results. For 847 data, the accuracy was 100% (847/847). The 2 411 and 2 594 haplotype combinations predicted from 787 and 847 data were grouped according to confidence degree, the accuracy was 100% (48/48, 114/114) for a confidence degree of 1, 96.2% (303/315) and 97.8% (409/418) for a confidence degree of 2 respectively. Validation results of "Predicting A, B, DRB1 and C, DQB1 high-resolution genotypes basing on HLA-A, B, DRB1 high, intermediate, or low resolution genotypes" model: when predicting C and DQB1 high resolution genotypes basing on A, B, and DRB1 high resolution genotypes, 89.3% (1 459/1 634) of the predictions were correct. The accuracy for the top 2 predicted probability (GPP) ranking was 79.2% (1 156/1 459), and for the top 10, it was 95.0% (1 386/1 459). Furthermore, when GPP≥90% and GPP 50%-90%, the prediction accuracy was 81.3% (209/257) and 72.8% (447/614) respectively. The accuracy of predicting C and DQB1 high resolution genotypes basing on the results of A, B, and DRB1 high resolution genotypes from the China Marrow Donor Program was 87.0% (20/23). The accuracy of predicting A, B, DRB1, C, and DQB1 high resolution genotypes basing on the results of A, B, and DRB1 intermediate or low-resolution genotypes was 70.0% (7/10) and 52.5% (21/40) respectively. When predicting whether the patient is likely to have a HLA 10/10 matched donor, the accuracy of the top 2 GPP combinations with a proportion of ≥50% was 85.7% (6/7). Conclusions: When using A, B, DRB1, C, DQB1 genotypes to predict A-C-B-DRB1-DQB1 haplotype combinations, the results with a confidence degree of 1 and 2 are reliable. When predicting C and DQB1 genotypes according to A, B and DRB1 genotypes, the top 10 results ranked by GPP are reliable, and the top 2 results with GPP≥50% are more reliable.
Read full abstract