Abstract

BackgroundAlthough human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Furthermore, there is no method to identify the sequences of unknown HLA types not registered in HLA databases.ResultsWe developed a Bayesian model, called ALPHLARD, that collects reads potentially generated from HLA genes and accurately determines a pair of HLA types for each of HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes at 3rd field resolution. Furthermore, ALPHLARD can detect rare germline variants not stored in HLA databases and call somatic mutations from paired normal and tumor sequence data. We illustrate the capability of ALPHLARD using 253 WES data and 25 WGS data from Illumina platforms. By comparing the results of HLA genotyping from SBT and amplicon sequencing methods, ALPHLARD achieved 98.8% for WES data and 98.5% for WGS data at 2nd field resolution. We also detected three somatic point mutations and one case of loss of heterozygosity in the HLA genes from the WGS data.ConclusionsALPHLARD showed good performance for HLA genotyping even from low-coverage data. It also has a potential to detect rare germline variants and somatic mutations in HLA genes. It would help to fill in the current gaps in HLA reference databases and unveil the immunological significance of somatic mutations identified in HLA genes.

Highlights

  • Human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth

  • These can be generally separated into two categories: those based on amplicon sequencing of human leukocyte antigen (HLA) loci [6, 7] and others based on unbiased sequencing methods such as whole exome sequencing (WES) and RNA sequencing (RNAseq) [8,9,10,11,12,13,14,15]

  • To achieve high accuracy for WGS-based HLA genotyping and further analysis of HLA genes, we developed a series of computational methods, which involve collection of sequence reads that are potentially generated from a target HLA gene followed by HLA genotyping, using a novel Bayesian model termed ALelle Prediction in HLA Regions from sequence Data (ALPHLARD)

Read more

Summary

Introduction

Human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Generation sequencing-based approaches have been developed for HLA genotyping These can be generally separated into two categories: those based on amplicon sequencing of HLA loci [6, 7] and others based on unbiased sequencing methods such as whole exome sequencing (WES) and RNA sequencing (RNAseq) [8,9,10,11,12,13,14,15]. HLA genotyping from WGS data remains a significant challenge, this approach would provide more information of HLA loci than possible with WES and RNA-seq data, including details of the non-coding regions such as the introns and the untranslated regions

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call