Abstract

BackgroundThe increasing use of DNA microarrays for genetical genomics studies generates a need for platforms with complete coverage of the genome. We have compared the effective gene coverage in the mouse genome of different commercial and noncommercial oligonucleotide microarray platforms by performing an in-house gene annotation of probes. We only used information about probes that is available from vendors and followed a process that any researcher may take to find the gene targeted by a given probe. In order to make consistent comparisons between platforms, probes in each microarray were annotated with an Entrez Gene id and the chromosomal position for each gene was obtained from the UCSC Genome Browser Database. Gene coverage was estimated as the percentage of Entrez Genes with a unique position in the UCSC Genome database that is tested by a given microarray platform.ResultsA MySQL relational database was created to store the mapping information for 25,416 mouse genes and for the probes in five microarray platforms (gene coverage level in parenthesis): Affymetrix430 2.0 (75.6%), ABI Genome Survey (81.24%), Agilent (79.33%), Codelink (78.09%), Sentrix (90.47%); and four array-ready oligosets: Sigma (47.95%), Operon v.3 (69.89%), Operon v.4 (84.03%), and MEEBO (84.03%). The differences in coverage between platforms were highly conserved across chromosomes. Differences in the number of redundant and unspecific probes were also found among arrays. The database can be queried to compare specific genomic regions using a web interface. The software used to create, update and query the database is freely available as a toolbox named ArrayGene.ConclusionThe software developed here allows researchers to create updated custom databases by using public or proprietary information on genes for any organisms. ArrayGene allows easy comparisons of gene coverage between microarray platforms for any region of the genome. The comparison presented here reveals that the commercial microarray Sentrix, which is based on the MEEBO public oligoset, showed the best mouse genome coverage currently available. We also suggest the creation of guidelines to standardize the minimum set of information that vendors should provide to allow researchers to accurately evaluate the advantages and disadvantages of using a given platform.

Highlights

  • The increasing use of DNA microarrays for genetical genomics studies generates a need for platforms with complete coverage of the genome

  • We have developed a platform for microarray annotation that provides gene annotations for probes and genomic positions for tested genes in the mouse genome

  • Number of genes in the genome The total number of genes in the genome was defined as the number of Entrez Genes with a unique genomic position at the UCSC Genome Browser Database [14]

Read more

Summary

Introduction

The increasing use of DNA microarrays for genetical genomics studies generates a need for platforms with complete coverage of the genome. In order to make consistent comparisons between platforms, probes in each microarray were annotated with an Entrez Gene id and the chromosomal position for each gene was obtained from the UCSC Genome Browser Database. The Resourcerer database [4] tackles this problem by pre-computing gene annotations on a more exhaustive list of microarrays and oligosets for a number of species [5] This database is centered on 'tentative consensus' (TC) sequences which are used as gene definitions. A different approach has been taken by Mattes[7] who created a set of Perl scripts that use UniGene and LocusLink as gene identifiers, providing a more universal gene definition that can be cross referenced with other databases. Allows for chromosome or genomic-region specific comparisons of gene coverage

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call