Abstract
There is a vast quantity of data being produced by Brassica researchers throughout the world. Genomic data include gene or Expressed Sequence Tag (EST) sequences and genomic sequences from bacterial artificial chromosomes and whole genome shotgun approaches. Associated gene expression or transcriptome data are being produced using various formats of microarray, Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signal Sequencing (MPSS). Molecular marker data such as Simple Sequence Repeats (SSRs) and Single Nucleotide Polymorphisms (SNPs) are providing insights into genetic structure and genetic diversity as well as inherited traits within Brassica species. Phenotypic data are also increasing in complexity through the characterisation of broad diverse germplasm collections and the development of advanced techniques in proteomics and metabolomics. There is a significant challenge in bringing this diverse set of data together in an integrated bioinformatics platform to permit interrogation across these broad fields. The most advanced genome database structure currently available uses the EnsEMBL format. EnsEMBL permits both broad data integration, comparative analysis between related organisms and efficient data interrogation. We have established a Brassica-centric EnsEMBL database founded on the current Arabidopsis thaliana EnsEMBL database, incorporating tracks for Brassica genes, genomic sequences and molecular markers. This database is publicly available to the Brassica research community and can be used as the foundation of a Brassica based EnsEMBL database on completion of the B. rapa genome sequencing under the Multinational Brassica Genome Project.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have