Abstract

BackgroundSince the completion of the rat reference genome in 2003, whole-genome sequencing data from more than 40 rat strains have become available. These data represent the broad range of strains that are used in rat research including commonly used substrains. Currently, this wealth of information cannot be used to its full extent, because the variety of different variant calling algorithms employed by different groups impairs comparison between strains. In addition, all rat whole genome sequencing studies to date used an outdated reference genome for analysis (RGSC3.4 released in 2004).ResultsHere we present a comprehensive, multi-sample and uniformly called set of genetic variants in 40 rat strains, including 19 substrains. We reanalyzed all primary data using a recent version of the rat reference assembly (RGSC5.0 released in 2012) and identified over 12 million genomic variants (SNVs, indels and structural variants) among the 40 strains. 28,318 SNVs are specific to individual substrains, which may be explained by introgression from other unsequenced strains and ongoing evolution by genetic drift. Substrain SNVs may have a larger predicted functional impact compared to older shared SNVs.ConclusionsIn summary we present a comprehensive catalog of uniformly analyzed genetic variants among 40 widely used rat inbred strains based on the RGSC5.0 assembly. This represents a valuable resource, which will facilitate rat functional genomic research. In line with previous observations, our genome-wide analyses do not show evidence for contribution of multiple ancestral founder rat subspecies to the currently used rat inbred strains, as is the case for mouse. In addition, we find that the degree of substrain variation is highly variable between strains, which is of importance for the correct interpretation of experimental data from different labs.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1594-1) contains supplementary material, which is available to authorized users.

Highlights

  • Since the completion of the rat reference genome in 2003, whole-genome sequencing data from more than 40 rat strains have become available

  • Genetic variation among strains We gathered the genomes of 37 rat strains that were sequenced previously [7,8,9,10,11,12] (Table 1) and analyzed them together with newly derived sequences from the BN-Lx/ CubPrin, spontaneously hypertensive rat (SHR)/OlaIpcvPrin and SHR/NCrlPrin rat strains (Additional file 1)

  • After applying strict criteria and using multi-sample variant calling we identified in total 9,183,702 Single Nucleotide Variant (SNV), 3,001,935 indels and 63,664 structural variants compared to the reference assembly

Read more

Summary

Introduction

Since the completion of the rat reference genome in 2003, whole-genome sequencing data from more than 40 rat strains have become available. After the availability of the first rat reference genome assembly in 2003 [13], the first variation catalog of a non-reference inbred strain, the spontaneously hypertensive rat (SHR), was published in 2010 [7] This data was later combined with the BN-Lx genome sequence and extended with RNA sequencing data, resulting in a comprehensive catalog of genetic variation and associated quantitative and qualitative transcription phenotypes, in the HXB/BXH recombinant inbred (RI) panel [8]. A large community-driven effort in rat genome sequencing yielded variation catalogs of 25 inbred strains and substrains [11] Analysis of this data identified strainspecific selective sweeps and gene clusters that implied genes involved in the development of cardiovascular disease in rat

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call