The horse reference genome has been improved with the release of EquCab3 and the first Y chromosome reference. The most complex sequences, such as those in the sex chromosomes, are still unresolved. These include large amplicons, repeats, and the pseudoautosomal boundary (PAB). We initiated a comprehensive study specifically to improve the assembly of the horse sex chromosomes. To refine the assembly of complex regions, we are utilizing 3 new technologies: trio-binning, Hi-C and Bionano optical mapping. Trio-binning uses long read sequences from F1 interspecific hybrids and short reads from parent species. High molecular weight blood DNA was extracted from a female hinny and sequenced on 2 PacBio Sequel cells. Paired-end Illumina reads (150bp) for horse ( Twilight ) and donkey ( Willy ) were obtained from SRA. These sequences as well as the hinny long reads were assembled with trio-binning function of the Canu assembler program. The initial assembly is 2.5 Gb separated into 1,757 contigs with an N50 of 41.5 Mb. We have completed Hi-C sequencing and Bionano optical maps for one thoroughbred stallion ( Bravo ). Both technologies are needed to scaffold the trio-binning Canu assembly. Our initial goal was to use this assembly to better define the PAB in the horse. The PAB demarcates the end of the pseudoautosomal region (PAR) where X-Y recombination stops. Despite the evolutionary and biological importance of the PAB, the region has been characterized at molecular level in only a few species. The PAB of the horse is presently not well defined. Previously, we identified and Sanger sequenced 4 BAC clones - 2 spanning PAB-X and PAB-Y. To identify the PAB of the horse, BAC sequences were aligned to the Y assembly, EquCab3 and a 42 Mb-size contig from trio-binning assembly which corresponds to the short arm of the X. We identified a region on both X and Y where X-Y homology drops from over 97% (PAR) to almost zero, indicative of the PAB. This region corresponds to the location of the XKR3Y gene in the Y but is not well-annotated in the X. We also identified a duplication and an inversion in EquCab3 which was not consistent with the corresponding region in the X-BACs, or the new 42 Mb Xp contig, suggesting a mis-assembly in EquCab3. We believe that these approaches combined will also resolve other complex and repetitive portions in the horse sex chromosomes.
Read full abstract