Abstract
The mammalian Major Histocompatibility Complex (MHC) region contains several gene families characterized by highly polymorphic loci with extensive nucleotide diversity, copy number variation of paralogous genes, and long repetitive sequences. This structural complexity has made it difficult to construct a reliable reference sequence of the horse MHC region. In this study, we used long-read single molecule, real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio) to sequence eight Bacterial Artificial Chromosome (BAC) clones spanning the horse MHC class II region. The final assembly resulted in a 1,165,328 bp continuous gap free sequence with 35 manually curated genomic loci of which 23 were considered to be functional and 12 to be pseudogenes. In comparison to the MHC class II region in other mammals, the corresponding region in horse shows extraordinary copy number variation and different relative location and directionality of the Eqca-DRB, -DQA, -DQB and –DOB loci. This is the first long-read sequence assembly of the horse MHC class II region with rigorous manual gene annotation, and it will serve as an important resource for association studies of immune-mediated equine diseases and for evolutionary analysis of genetic diversity in this region.
Highlights
Screening of CHORI-241 library, Bacterial Artificial Chromosome (BAC) clones constituting a minimum tiling-path of the horse Major Histocompatibility Complex (MHC) was identified and an ordered BAC contig map was constructed[24,25]
The strategy of sequencing single BAC clones resulted in mean read coverage exceeding 300-fold for each BAC clone, while the pooled sequencing strategy resulted in a mean coverage ranging from 108- to 287-fold (Table 1)
To produce a de novo assembly of the horse MHC class II region, we identified overlapping regions of the eight sequenced BAC clones
Summary
Screening of CHORI-241 library, BAC clones constituting a minimum tiling-path of the horse MHC was identified and an ordered BAC contig map was constructed[24,25]. Generating a reliable high-quality de novo sequence assembly using traditional Sanger-sequencing or short read next-generation sequencing technologies over a complex region, such as the horse MHC class II, is inherently difficult due to the many multigene families with copy number variation of paralogous loci, long repetitive elements and extensive allelic polymorphism. Recent studies have shown that the long-read single molecule, real-time (SMRT) sequencing technology from Pacific Bioscience (PacBio) can be successfully applied to resolve complex genomic regions and improve genome assemblies[26,27,28,29]. The purpose of this study was to provide an enhanced and carefully annotated reference sequence of the MHC class II region
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.