Abstract

Aim Given the complex nature of the human HLA system, generating reliable, full-length phase-defined reference sequences for novel HLA alleles remains a substantial challenge. Here we present a computational workflow that tackles this challenge by combining the respective strengths of two types of sequencing approaches: The high read quality of Illumina short read data and the long read length of single-molecule sequencing approaches such as PacBio and Oxford Nanopore. Methods Long reads from heterozygous samples are mapped initially against a global locus-specific reference. Based on this mapping, a SNP matrix is constructed and long reads are clustered into two read sets representing the two haplotypes. These read sets are used to construct preliminary haplotype consensus sequences, against which short reads are mapped. Integrating the alignments derived from the two complementary sequencing approaches eliminates each method’s systematic biases in the final consensus sequences and generates HLA haplotypes that identify both single-nucleotide and structural variants in reference quality. Results The DR2S approach was evaluated with 90 heterozygous HLA class I samples. Error rates decreased with increasing long-read coverage. The quality gain due to augmenting long-read data with short-read data was most pronounced at low to moderate long-read coverage ( Conclusion The integration of single-molecule and sequencing-by-synthesis NGS data by DR2S eliminates both single-nucleotide and structural variation errors and thereby yields reference quality de novo sequences.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.