Aim Utilizing a previously developed long-fragment DNA capture methodology, PacBio single molecule real-time (SMRT) sequencing and a custom de novo assembly analysis pipeline, we aim to resolve the long-range, fully phased MHC haplotypes for a single heterozygous individual. Methods Long fragment DNA enrichment for the MHC was performed on fresh blood obtained from a heterozygous individual using our previously described method, RSE. Captured DNA was sequenced on the PacBio RSII sequencing platform. Raw reads were selected to include only those reads that map to annotated MHC haplotype sequences (n = 8). De novo assembly was performed using the CANU pipeline with tuned algorithm parameters to optimize continuity and accuracy of haplotigs (phased contigs) as compared to phased SNP genotyping data. Results Following de novo assembly, PacBio reads were assembled into 27 contiguous fragments (contigs) covering 97% of the 4Mbp MHC (as aligned to the COX MHC haplotype). Phased contigs (haplotigs) were found to agree with phased SNP genotyping data at ∼78% of interrogated positions (n = 4,779 probes) indicating that the majority of haplotigs are properly phased. Phased SNP genotyping data was also used to phase across breaks in the assembly, resulting in the generation of two distinct MHC haplotypes for the sequenced sample. Conclusions Our results demonstrate the capacity to generate de novo assembled, fully phased MHC haplotypes from PacBio single molecule sequencing reads. Although phasing across contig breaks caused by coverage gaps remains a challenge, we hope to resolve this issue using BioNano genomics optical maps, so as not to rely on phased SNP genotyping data. In essence, this method has the capacity to resolve full-length MHC haplotype sequences without the need for any reference genome or familial trio analysis.
Read full abstract