Abstract

Aim Nanopore based sequencing has seen rapid advancement in the recent years with iterations in pore structure, sequencing chemistries and base callers. This has led to increasingly accurate sequencing of extremely long DNA molecules. Since current HLA genotyping algorithms are not optimized for nanopore data we developed a genotyping algorithm based on read grouping and subsequent mapping to a reduced reference database. Methods High raw sequencing errors present a challenge for genotyping based on direct mapping to a reference database. Our algorithm, Poretyper, obviates this initial mapping by first grouping the raw reads based on the distributions of k-mers within each read. A multiple sequence alignment derived from the groups of raw reads results in a set of consensus sequences which represent potential alleles. These consensus sequences are then used to create a culled reference database dramatically reducing the search space, thus reducing artefactual raw read mappings. Results HLA-A, HLA-B and HLA-C for a hundred samples drawn from the DKMS donor registry were sequenced using the MinION with R9.5 chemistry. All hundred samples were then genotyped using Poretyper and existing G-group pretypings could be recapitulated failing only for those alleles where no full length sequences were available. Conclusions Nanopore sequencing presents a viable and accurate platform for cost-efficient full-length HLA Class I genotyping. For those alleles where full length reference sequences are not available, an in silico extension of such allele sequences using the full-length sequence of the next closest allele presents a viable approach for full-length genotyping.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call