Abstract

Assessing correctness of an assembled chromosome architecture is a central challenge. We create a geometric analysis method (called GenomeLandscaper) to conduct landscape analysis of genome-fingerprints maps (GFM), trace large-scale repetitive regions, and assess their impacts on the global architectures of assembled chromosomes. We develop an alignment-free method for phylogenetics analysis. The human Y chromosomes (GRCh.chrY, HuRef.chrY and YH.chrY) are analysed as a proof-of-concept study. We construct a galaxy of genome-fingerprints maps (GGFM) for them, and a landscape compatibility among relatives is observed. But a long sharp straight line on the GGFM breaks such a landscape compatibility, distinguishing GRCh38p1.chrY (and throughout GRCh38p7.chrY) from GRCh37p13.chrY, HuRef.chrY and YH.chrY. We delete a 1.30-Mbp target segment to rescue the landscape compatibility, matching the antecedent GRCh37p13.chrY. We re-locate it into the modelled centromeric and pericentromeric region of GRCh38p10.chrY, matching a gap placeholder of GRCh37p13.chrY. We decompose it into sub-constituents (such as BACs, interspersed repeats, and tandem repeats) and trace their homologues by phylogenetics analysis. We elucidate that most examined tandem repeats are of reasonable quality, but the BAC-sized repeats, 173U1020C (176.46 Kbp) and 5U41068C (205.34 Kbp), are likely over-repeated. These results offer unique insights into the centromeric and pericentromeric regions of the human Y chromosomes.

Highlights

  • Centromeres and telomeres of mammalian genomes are yet-untouchable large-scale repetitive regions due to technical constraints of sequencing and assembling[1,2,3], despite that straightforward assembling was improved by de novo assemblers like Celera[4,5,6], SOAPdenovo[7], Supernova[8], Canu[9], HINGE10 and Recon[11] equipped on platforms such as Sanger, Illumina, PacBio and Oxford Nanopore

  • Based on our GenomeFingerprinter algorithm[19], here we establish a method to construct a galaxy of genome-fingerprints maps (GGFM), which comprises a set of genome-fingerprints maps (GFM) that are simultaneously constructed for a set of chromosomes under comparison

  • To conduct landscape analysis of genome-fingerprints maps (GFM) and retrospectively assess the global architectures of assembled chromosomes, we establish the GenomeLandscaper method based on our GenomeFingerprinter algorithm[19]

Read more

Summary

Introduction

Centromeres and telomeres of mammalian genomes are yet-untouchable large-scale repetitive regions due to technical constraints of sequencing and assembling[1,2,3], despite that straightforward assembling was improved by de novo assemblers like Celera[4,5,6], SOAPdenovo[7], Supernova[8], Canu[9], HINGE10 and Recon[11] equipped on platforms such as Sanger, Illumina, PacBio and Oxford Nanopore. It is challenging to retrospectively assess the correctness of an assembled chromosome architecture because no “true” sequence can be referred to[1,2,3] as well as the data-driven analysis is hampered by a computing burden of base-to-base alignment at a large scale. The human genomes have typical assemblies, such as GRCh from a mixture of diploids (Global)[12,13], HuRef from an individual diploid (USA)[4,5,6] and YH from an individual diploid (Asia)[7] As a proof-of-concept study, we create a GGFM for the human Y chromosomes (GRCh.chrY12,13, HuRef.chrY4–6 and YH.chrY7) and conduct assessments on their global architectures. This study establishes a method to retrospectively assess the correctness of assembled chromosome architectures by means of evaluating the quality of their multiple assemblies, which is crucial to assess, re-construct and use complex genomes

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call