TranscriptSNPView: a genome-wide catalog of mouse coding variation

Fiona Cunningham,Jane Rogers,Javier Herrero,Paul Flicek,Zemin Ning,Daniel Rios,Louise Van Der Weyden,Mark Griffiths,Tony Cox,James Smith,Allan Bradley,Pablo Marin-Garcin,David J Adams,Ewan Birney

doi:10.1038/ng0806-853a

Fiona Cunningham, Jane Rogers + Show 12 more

Open Access

https://doi.org/10.1038/ng0806-853a

Copy DOI

Abstract

To the Editor: With the recent release of the genome-wide sequence for multiple inbred mouse strains1, and with resequencing data for a large number of additional strains entering the public domain (http://www.niehs.nih.gov/crg/cprc.htm), we are one step closer to being able to identify the underlying genetic variants responsible for the trait characteristics that define each strain. Here, we describe a genome-wide catalog of coding variation in the mouse genome that was developed using an extensive collection of mouse DNA sequence reads, including those recently released by Celera, data from dbSNP2 and resequencing data generated by Perlegen Sciences for the US National Institute of Environmental Health Sciences (NIEHS). To display these data, we developed a new software tool, TranscriptSNPView, which has been integrated into the Ensembl Genome Browser to take advantage of the evolving mouse genome assembly and the latest Ensembl3 and Vega gene predictions4. TranscriptSNPView can be accessed via the Ensembl Genome Browser (http://www.ensembl.org/Mus_musculus/transcriptsnpview). TranscriptSNPView displays coding SNP data from 48 mouse strains (Supplementary Table 1 online). Using the SNP calling algorithm ssahaSNP5, we computed over 50 million SNPs from the common laboratory Mus musculus strains A/J, DBA/2J, 129X1/SvJ and 129S1/SvImJ from whole-genome shotgun sequence reads generated by Celera, and from C3HeB/FeJ and NOD BAC-end sequence reads generated by the Wellcome Trust Sanger Institute. We also generated SNP calls from the Mus musculus molossinus strain MSM/Ms using sequence reads generated by RIKEN6 (Supplementary Table 1). Collectively, these SNP calls have been designated ‘Sanger SNPs’. The 25 million DNA sequence reads used to generate the Sanger SNP collection represent 7.32-fold coverage of the NCBI mouse build 35 genome assembly and are available via the Ensembl trace repository (http://trace.ensembl.org). The Sanger SNP calls were distilled to 6.87 million nonredundant genome-wide SNP features and were combined with an additional 6.4 million dbSNP entries (version 126), providing data for an additional 41 mouse strains. By merging these data sets and mapping them against the Ensembl 38.35 mouse gene build, we collated 726,462 coding SNP variants across all strains and computed their amino acid consequences to identify 249,996 nonsynonymous coding changes and 2,667 stop codons. Coding SNP figures for each strain are provided in Supplementary Table 1. We also identified instances where stop codons had been lost, and we predicted mutations in introns, invariant intronic splice sites and in untranslated and regulatory regions. These predictions, which can be used as a basis for identifying functional SNP variants, are displayed in TranscriptSNPView. A detailed description of all of the features of TranscriptSNPView is provided in the Supplementary Note online. A data collection of this quality and depth is unprecedented and will provide the means to obtain a high-resolution picture of coding variation in the mouse genome. TrancriptSNPView represents a powerful new tool for functional analysis of the mouse genome and will become a central repository for mouse coding variation data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Genetics	Publication Date: Aug 1, 2006
Citations: 16	License type: public-domain

R Discovery Prime

R Discovery Prime

TranscriptSNPView: a genome-wide catalog of mouse coding variation

Abstract

Talk to us

Similar Papers

More From: Nature Genetics

Lead the way for us

Similar Papers

Power frequency electromagnetic fields and health.Where's the evidence?5
Alan W Preece ... Alice Stewart
Physics in Medicine & Biology | VOL. 45
Alan W Preece, et. al.Alan W Preece ... Alice Stewart
25 Aug 2000
Physics in Medicine & Biology | VOL. 45

Global environmental health: an interview with Sally Perreault Darney
By Li Xu ... Bin Zhao
National Science Review | VOL. 3
By Li Xu, et. al.By Li Xu ... Bin Zhao
01 Dec 2016
National Science Review | VOL. 3

Good news for lab animals
Martin J Davies
Trends in Biotechnology | VOL. 19
Martin J DaviesMartin J Davies
09 Nov 2001
Trends in Biotechnology | VOL. 19

Do we know enough about EMF-induced health effects?
Michael H Repacholi
Journal of Radiological Protection | VOL. 18
Michael H RepacholiMichael H Repacholi
01 Sep 1998
Journal of Radiological Protection | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TranscriptSNPView: a genome-wide catalog of mouse coding variation

Abstract

Talk to us

Similar Papers

More From: Nature Genetics