Abstract

BackgroundReference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest reference sequence adopted by the scientific community for the analysis of cattle data is ARS_UCD1.2, built from the DNA of a Hereford cow (Bos taurus taurus—B. taurus). A complementary genome assembly, UOA_Brahman_1, was recently built to represent the other cattle subspecies (Bos taurus indicus—B. indicus) from a Brahman cow haplotype to further support analysis of B. indicus data. In this study, we aligned the sequence data of 15 B. taurus and B. indicus breeds to each of these references.ResultsThe alignment of B. taurus individuals against UOA_Brahman_1 detected up to five million more single-nucleotide variants (SNVs) compared to that against ARS_UCD1.2. Similarly, the alignment of B. indicus individuals against ARS_UCD1.2 resulted in one and a half million more SNVs than that against UOA_Brahman_1. The number of SNVs with nearly fixed alternative alleles also increased in the alignments with cross-subspecies. Interestingly, the alignment of B. taurus cattle against UOA_Brahman_1 revealed regions with a smaller than expected number of counts of SNVs with nearly fixed alternative alleles. Since B. taurus introgression represents on average 10% of the genome of Brahman cattle, we suggest that these regions comprise taurine DNA as opposed to indicine DNA in the UOA_Brahman_1 reference genome. Principal component and admixture analyses using genotypes inferred from this region support these taurine-introgressed loci. Overall, the flagged taurine segments represent 13.7% of the UOA_Brahman_1 assembly. The genes located within these segments were previously reported to be under positive selection in Brahman cattle, and include functional candidate genes implicated in feed efficiency, development and immunity.ConclusionsWe report a list of taurine segments that are in the UOA_Brahman_1 assembly, which will be useful for the interpretation of interesting genomic features (e.g., signatures of selection, runs of homozygosity, increased mutation rate, etc.) that could appear in future re-sequencing analysis of indicine cattle.

Highlights

  • Reference genomes are essential in the analysis of genomic data

  • Aligned reads in sequence alignment map (SAM) files were sorted by chromosome and consecutively converted to binary alignment map (BAM) files using the sort function of samtools v1.10 [20]

  • Duplicate reads in each BAM file were flagged using the Picard MarkDuplicates tool embedded in the genome analysis tool kit (GATK v.4.1.0) [21]

Read more

Summary

Introduction

Reference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest and most widely used cattle genome release ARS_UCD1.2 has only 386 gaps in its final assembly This assembly was an update of the UMD3.1 assembly that was based on the same inbred Hereford cow as the DNA source providing 250 × more continuity than its predecessor [9, 10]. Using parent-specific k-mers, the paternal and maternal haplotypes of the F1 animal were separated, leading to an Angus-specific assembly (UOA_Angus_1) and a Brahman-specific assembly (UOA_Brahman_1) Both assemblies were constructed by combining the latest sequencing technologies of PacBio long-reads, Hi-C data, Bio-nano optical reads and Illumina short reads, respectively resulting in 277 and 302 gaps in the final assemblies, which is less than in GRCh38 and ARS_ UCD1.2 [7, 9]. It should be noted that the UOA_Brahman_1 reference sequence is the first published de novo assembly of a B. indicus genome

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call