Abstract

BackgroundWith the recent progress made in large-scale genome sequencing projects a vast amount of novel data is becoming available. A comparative sequence analysis, exploiting sequence information from various resources, can be used to uncover hidden information, such as genetic variation. Although there are enormous amounts of SNPs for a wide variety of organisms submitted to NCBI dbSNP and annotated in most genome assembly viewers like Ensembl and the UCSC Genome Browser, these platforms do not easily allow for extensive annotation and incorporation of experimental data supporting the polymorphism. However, such information is very important for selecting the most promising and useful candidate polymorphisms for use in experimental setups.DescriptionThe CASCAD database is designed for presentation and query of candidate SNPs that are retrieved by in silico mining of high-throughput sequencing data. Currently, the database provides collections of laboratory rat (Rattus norvegicus) and zebrafish (Danio rerio) candidate SNPs. The database stores detailed information about raw data supporting the candidate, extensive annotation and links to external databases (e.g. GenBank, Ensembl, UniGene, and LocusLink), verification information, and predictions of a potential effect for non-synonymous polymorphisms in coding regions. The CASCAD website allows search based on an arbitrary combination of 27 different parameters related to characteristics like candidate SNP quality, genomic localization, and sequence data source or strain. In addition, the database can be queried with any custom nucleotide sequences of interest. The interface is crosslinked to other public databases and tightly coupled with primer design and local genome assembly interfaces in order to facilitate experimental verification of candidates.ConclusionsThe CASCAD database discloses detailed information on rat and zebrafish candidate SNPs, including the raw data underlying its discovery. An advanced web-based search interface allows universal access to the database content and allows various queries supporting many types of research utilizing single nucleotide polymorphisms.

Highlights

  • With the recent progress made in large-scale genome sequencing projects a vast amount of novel data is becoming available

  • A comprehensive inventory of SNPs, including extensive annotation will be extremely valuable in the search for functional polymorphisms

  • In an effort to address these two issues, we have developed an in silico candidate SNP mining pipeline that uses all publicly available sequence data for a specific organism, and designed a database, CASCAD (CAscad SNP CAndidates Database), that allows storage of a wide variety of primary source data, cross-annotation to other databases, and analysis parameters for SNPs associated with expressed sequences

Read more

Summary

Conclusions

The main purpose of CASCAD database is to provide flexible access to candidate single nucleotide polymorphisms, which were predicted using a computational approach from publicly available sequence data of the rat and zebrafish. The resulting database is crosslinked to most common public databases and can be queried for SNPs using accession numbers, sequence context, SNP characteristics, and using parameters specific to the SNP discovery process, allowing stringent or relaxed conditions suitable for different types of applications. The database is freely accessible through the website http:/ /cascad.niob.knaw.nl. Scripts, MySQL database dumps, and instructions for setting up a species-specific SNP database can be obtained from the authors upon request. VG designed and implemented the CASCAD database. EC provided supervision and guidance for the project

Background
Findings
Utility and discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.