Abstract
Correctly matching the HLA haplotypes of donor and recipient is essential to the success of allogenic hematopoietic stem cell transplantation. Current HLA typing methods rely on targeted testing of recognized antigens or sequences. Despite advances in Next Generation Sequencing, general high throughput transcriptome sequencing is currently underutilized for HLA haplotyping due to the central difficulty in aligning sequences within this highly variable region. Here we present the method, HLAforest, that can accurately predict HLA haplotype by hierarchically weighting reads and using an iterative, greedy, top down pruning technique. HLAforest correctly predicts >99% of allele group level (2 digit) haplotypes and 93% of peptide-level (4 digit) haplotypes of the most diverse HLA genes in simulations with read lengths and error rates modeling currently available sequencing technology. The method is very robust to sequencing error and can predict 99% of allele-group level haplotypes with substitution rates as high as 8.8%. When applied to data generated from a trio of cell lines, HLAforest corroborated PCR-based HLA haplotyping methods and accurately predicted 16/18 (89%) major class I genes for a daughter–father-mother trio at the peptide level. Major class II genes were predicted with 100% concordance between the daughter–father-mother trio. In fifty HapMap samples with paired end reads just 37 nucleotides long, HLAforest predicted 96.5% of allele group level HLA haplotypes correctly and 83% of peptide level haplotypes correctly. In sixteen RNAseq samples with limited coverage across HLA genes, HLAforest predicted 97.7% of allele group level haplotypes and 85% of peptide level haplotypes correctly.
Highlights
Hematopoietic stem cell transplantation (HSCT) has successfully treated a wide variety of diseases including autoimmune disorders, rare genetic diseases and blood cancers [1]
These simulations were performed only on the Human Leukocyte Antigen (HLA) genes HLA-A, HLA-B, HLA-C, and HLA-DRB1 as these represented the majority of diversity present in the IMGT database (Figure S1)
Fifty HapMap samples that have been both HLA haplotyped with Sanger Sequencing and for which there are RNA-seq data available were analyzed using HLAforest
Summary
Hematopoietic stem cell transplantation (HSCT) has successfully treated a wide variety of diseases including autoimmune disorders, rare genetic diseases and blood cancers [1]. The process of matching a donor and recipient requires HLA haplotyping, a process that typically requires specialized antibody or targeted DNA based tests [3]. The use of untargeted RNA-seq data for HLA haplotyping has not seen much development despite dramatic reductions in sequencing costs. This is unfortunate, as RNA-seq assays have proven to be useful tools in personalized medicine for the subclassification of cancers including breast, prostate, leukemias and lymphomas [11,12,13,14,15]. In cases where a patient would benefit from sequencing of RNA and whose disease would require HSCT, it would be cost effective to predict HLA haplotypes directly from RNA-seq data rather than performing an additional specialized test to determine HLA haplotype
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.