Abstract

BackgroundHuman leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.ResultsWe proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.ConclusionsThe BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-296) contains supplementary material, which is available to authorized users.

Highlights

  • Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases

  • We compared the first method BayesTyping0 with NGSengine, which is a platform-independent software for next generation sequencing (NGS) data analysis of HLA genes

  • The experimental results showed that BayesTyping1 can identify HLA alleles accurately using reasonably low number of Pacific Biosciences SMRT (PacBio) circular consensus sequencing read (CCS) reads

Read more

Summary

Introduction

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. The main function of MHC molecules is to mediate interactions between antigen-presenting cells, various lymphocytes and other body cells; malfunctions of HLA may associate with certain disorders in the immune system, for example, drug hypersensitivity reactions [1] and some autoimmune diseases, e.g., type 1 diabetes and systemic lupus erythematosus [2]. The HLA genes are the most polymorphic genes in humans and the difference between two alleles is often only a single base pair substitution. The exon 2 and exon 3 sequence of class I HLA genes and the exon 2 sequence of class II HLA genes form the critical peptidebinding groove responsible for the specificity of peptide recognition and binding [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call