Abstract

Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. EquCab2, a reference genome for the domestic horse, was released in 2007. Although of equal or better quality compared to other first-generation Sanger assemblies, it had many of the shortcomings common to them. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. Here, we present EquCab3. The count of non-N bases in the incorporated chromosomes is improved from 2.33 Gb in EquCab2 to 2.41 Gb in EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5 Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.

Highlights

  • 1234567890():,; Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition

  • We present here a new reference assembly for the domestic horse, EquCab[3]. This assembly benefited from rapidly evolving high-throughput sequencing technologies and new algorithms used to assemble data from these platforms. This project began from the solid foundation of 6.8-fold coverage Sanger sequence data[2], as well as a radiation hybrid map and fluorescence in situ hybridization (FISH) data[20]

  • The previously published datasets are comprised of the data used to construct EquCab[2]: Sanger sequencing data, bacterial artificial chromosome (BAC)-end pairs[2], and a physical map containing radiation hybrid and FISH markers[20]

Read more

Summary

Introduction

1234567890():,; Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab[2] and incorporating new short-read data, long-read data, and proximity ligation data. This assembly benefited from rapidly evolving high-throughput sequencing technologies and new algorithms used to assemble data from these platforms This project began from the solid foundation of 6.8-fold coverage Sanger sequence data[2], as well as a radiation hybrid map and FISH data[20]. The resulting assembly is enhanced in contiguity, and in composition This new version of the reference sequence for the domestic horse reduces the number of gaps 10-fold and increases the number of assembled bases by 3% in the incorporated chromosomes over EquCab[2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call