Abstract

SummaryHLA*LA implements a new graph alignment model for human leukocyte antigen (HLA) type inference, based on the projection of linear alignments onto a variation graph. It enables accurate HLA type inference from whole-genome (99% accuracy) and whole-exome (93% accuracy) Illumina data; from long-read Oxford Nanopore and Pacific Biosciences data (98% accuracy for whole-genome and targeted data) and from genome assemblies. Computational requirements for a typical sample vary between 0.7 and 14 CPU hours per sample.Availability and implementationHLA*LA is implemented in C++ and Perl and freely available as a bioconda package or from https://github.com/DiltheyLab/HLA-LA (GPL v3).Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Genetic variation at the human leukocyte antigen (HLA) loci is associated with many important phenotypes and biological conditions, including autoimmune and infectious disease risk, transplant rejection, and the repertoire of immune-presented peptides (Trowsdale and Knight, 2013)

  • On high-coverage Illumina whole-genome sequencing (WGS) data (1000 Genomes Project Consortium et al, 2012; Eberle et al, 2017), HLA*LA achieves a performance of 99.4% averaged over the six classical HLA genes

  • On whole-exome Illumina sequencing data (International HapMap, 2005), HLA*LA outperforms HLA*Population Reference Graphs (PRGs) (93% accuracy compared to 89%) and Kourami at matched call rates (95.3% at 88% call rate compared to 94.6% accuracy at 83% call rate)

Read more

Summary

Introduction

Genetic variation at the human leukocyte antigen (HLA) loci is associated with many important phenotypes and biological conditions, including autoimmune and infectious disease risk, transplant rejection, and the repertoire of immune-presented peptides (Trowsdale and Knight, 2013). Specialized HLA analysis methods, have been developed (Bai et al, 2014; Dilthey et al, 2016; Huang et al, 2015; Lee and Kourami, 2018; Wittig et al, 2015; Xie et al, 2017); these typically rely on variation-aware alignment approaches, e.g. genome graphs or collections of linear reference sequences. We present HLA*LA (‘linear alignments’), a graph-based method with high accuracy on exome and low-coverage WGS data, full support for assembled and unassembled long-read data and a new projection-based approach to graph alignment. The alignment process starts with the identification of linear alignments between the input reads and the reference haplotypes that the graph was constructed from; these are projected onto the graph, an approach used by Lee and Kourami, 2018, and optimized in a stepwise process specific to Population Reference Graphs (PRGs; Dilthey et al, 2015). Its hybrid approach enables HLA*LA to often avoid—potentially costly—full graph alignment, to leverage the performance of highly optimized linear alignment algorithms and to carry out graph alignment for both short and long reads in a unified framework

Materials and methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.