Abstract

BackgroundSingle-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users.ResultsA web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline.ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects.ConclusionsThe scripts used to create the ESTIMA interface are freely available to academic users in an archived format from . The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.

Highlights

  • Single-pass, partial sequencing of complementary DNA libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and assembled into contigs representative of putative genes

  • We describe Expressed Sequence Tag Information Management and Annotation (ESTIMA) (EST Information Management and Annotation) software that provides a database schema for management of raw and annotated ESTs, and is coupled with a suite of custom web-based tools that facilitate searching various aspects of ESTs and contigs, visualization, pairwise searching by BLAST [9], and functional classification based on the controlled vocabulary defined by the Gene Ontology (GO) Consortium [11]

  • It serves as a stand-alone web application that allows users to store, access, research, and visualize the raw and annotated ESTs and contigs, including GO annotation

Read more

Summary

Results

ESTIMA is independent of an EST processing pipeline ESTIMA is unlinked from the backend EST processing pipeline, clustering, and assembly of ESTs. Both mouse and honeybee brain sequences, may be used to do a deeper phylogenetic search with a BLASTX against non-redundant protein database to test the tissue-specificity hypothesis. ESTIMA projects, as compared to other public web-applications such as TIGR gene indices [5], allow access to singlets from the EST assemblies, and chromatogram retrieval These singlets would include rare, novel transcripts and divergent homologs that are increasingly the sole motivation for a research project. Since ESTIMA includes only high quality sequences in the databases, users may search for and download these novel transcripts, and efficiently implement a homology search strategy using the web-application Another strength of ESTIMA is in facilitating chromatogram and contig viewing from a common interface (Sequence ID).

Conclusions
Background
11. The Gene Ontology Consortium
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call