Abstract

In an alignment of closely related genomic sequences, the existence of discordant mutation sites, which do not reflect the phylogenetic relationship of the genomes, is often observed. Although these discordant mutation sites are thought to have emerged by ancestral polymorphism or gene flow, their frequency and distribution in the genome have not yet been analyzed in detail. Using the genome sequences of all protein coding genes of 25 inbred rat strains, we analyzed the frequency and genome-wide distribution of the discordant mutation sites. From the comparison of different substrains, it was found that these loci are not substrain specific, but are common among different groups of substrains, suggesting that the discordant sites might have mainly emerged through ancestral polymorphism. It was also revealed that the discordant sites are not uniformly distributed along chromosomes, but are concentrated at certain genomic loci, such as RT1, major histocompatibility complex of rats, and olfactory receptors, indicating that genes known to be highly polymorphic tend to have more discordant sites. Our results also showed that loci with a high density of discordant sites are also rich in heterozygous variants, even though these are inbred strains.

Highlights

  • After genome sequencing had been completed for representative model organisms, such as mice and rats (Waterston et al 2002; Gibbs et al 2004), genomes of other species have been sequenced and even more sequencing projects are in progress (Koepfli et al 2015)

  • One possibility is the existence of ancestral polymorphisms (Slatkin and Pollack 2008)

  • The other possibility is the result of gene flow (Slatkin 1987)

Read more

Summary

Introduction

After genome sequencing had been completed for representative model organisms, such as mice and rats (Waterston et al 2002; Gibbs et al 2004), genomes of other species have been sequenced and even more sequencing projects are in progress (Koepfli et al 2015). By comparing various genome sequences of different species in a clade, it is possible to identify functionally important sites other than genes (i.e., conserved non-coding sequences). If the reference genome for a species has already been determined, genome sequencing for other strains for the species is comparatively easy because laborious assembling process is not required. The other reason is that next-generation sequencing technology, which is still drastically improving in terms of throughput and time and cost, is greatly accelerating research in this direction. By comparing these closely related genome sequences, we can identify strain-specific traits such as disease susceptibility (e.g., Fairfield et al 2011)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.