Abstract

Bi-allelic Single Nucleotide Polymorphism (SNP) markers are widely used in population genetic studies. In most studies, sequences either side of the SNPs remain unused, although these sequences contain information beyond that used in population genetic studies. In this study, we show how these sequence tags either side of a single nucleotide polymorphism can be used for comparative genome analysis. We used DArTseq (Diversity Array Technology) derived SNP data for a non-model Australian native freshwater fish, Macquaria ambigua, to identify genes linked to SNP associated sequence tags, and to discover homologies with evolutionarily conserved genes and genomic regions. We concatenated 6,776 SNP sequence tags to create a hypothetical genome (representing 0.1–0.3% of the actual genome), which we used to find sequence homologies with 12 model fish species using the Ensembl genome browser with stringent filtering parameters. We identified sequence homologies for 17 evolutionarily conserved genes (cd9b, plk2b, rhot1b, sh3pxd2aa, si:ch211-148f13.1, si:dkey-166d12.2, zgc:66447, atp8a2, clvs2, lyst, mkln1, mnd1, piga, pik3ca, plagl2, rnf6, sec63) along with an ancestral evolutionarily conserved syntenic block (euteleostomi Block_210). Our analysis also revealed repetitive sequences covering approximately 12% of the hypothetical genome where DNA transposon, LTR and non-LTR retrotransposons were most abundant. A hierarchical pattern of the number of sequence homologies with phylogenetically close species validated the approach for repeatability. This new approach of using SNP associated sequence tags for comparative genome analysis may provide insight into the genome evolution of non-model species where whole genome sequences are unavailable.

Highlights

  • In recent years, advances in generation sequencing technology have yielded higher resolution data for molecular genetic analyses

  • Analysis of masked repetitive sequences revealed a total of 628 repetitive fragments representing 49,561 bp covering approximately 12.21% of the GP-H-Genome (Table 1)

  • The mapping strategy used for the concatenated M. ambigua genome revealed a variable number of homologies across fish species, leading to a hierarchical pattern supporting the phylogenetic position of the taxa in the fish tree (Fig 2)

Read more

Summary

Introduction

Advances in generation sequencing technology have yielded higher resolution data for molecular genetic analyses. Sequencing data, ranging from short genomic fragments to whole genome sequencing, has been used to answer critical questions about evolutionary genetics using comparative genome analysis [1,2,3]. While whole genome sequencing provides the highest resolution for comparative analyses, it remains expensive and may not be cost effective for non-model species. Sequencing short genomic fragments, including molecular markers (microsatellites, SNPs) costs less and, while providing a lower resolution, may be useful for comparative genome analysis [4]. In most studies, sequenced data either side of the SNPs remain unused, these sequences contain information beyond that used in population genetic studies, such as identification of evolutionarily conserved regions [5,6,7]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call