Abstract

Tree species of the genus Eucalyptus are the most valuable and widely planted hardwoods in the world. Given the economic importance of Eucalyptus trees, much effort has been made towards the generation of specimens with superior forestry properties that can deliver high-quality feedstocks, customized to the industrýs needs for both cellulosic (paper) and lignocellulosic biomass production. In line with these efforts, large sets of molecular data have been generated by several scientific groups, providing invaluable information that can be applied in the development of improved specimens. In order to fully explore the potential of available datasets, the development of a public database that provides integrated access to genomic and transcriptomic data from Eucalyptus is needed. EUCANEXT is a database that analyses and integrates publicly available Eucalyptus molecular data, such as the E. grandis genome assembly and predicted genes, ESTs from several species and digital gene expression from 26 RNA-Seq libraries. The database has been implemented in a Fedora Linux machine running MySQL and Apache, while Perl CGI was used for the web interfaces. EUCANEXT provides a user-friendly web interface for easy access and analysis of publicly available molecular data from Eucalyptus species. This integrated database allows for complex searches by gene name, keyword or sequence similarity and is publicly accessible at http://www.lge.ibi.unicamp.br/eucalyptusdb. Through EUCANEXT, users can perform complex analysis to identify genes related traits of interest using RNA-Seq libraries and tools for differential expression analysis. Moreover, all the bioinformatics pipeline here described, including the database schema and PERL scripts, are readily available and can be applied to any genomic and transcriptomic project, regardless of the organism. Database URL: http://www.lge.ibi.unicamp.br/eucalyptusdb

Highlights

  • The Eucalyptus genus is composed by more than 700 species and includes the most extensively planted hardwood trees in the world [1, 2]

  • The EUCANEXT database was developed with the main purpose of aiding the mining of genes related to important silvicultural properties, such as stress response and productivity that can be obtained by comparison of RNA-Seq data from different species or limiting nitrogen/ water conditions

  • The database provides tools to compare gene expression, allowing for the identification of transcripts expressed in certain species or tissues, and perform Gene Ontology enrichment analysis using a set of Phytozome transcript ID uploaded by user

Read more

Summary

Introduction

The Eucalyptus genus is composed by more than 700 species and includes the most extensively planted hardwood trees in the world [1, 2]. The second one has the data about the digital gene expression of each Eucalyptus grandis transcript in each library (fields ‘rpkm’ and ‘read_count’) This table is linked to the ‘transcripts’ table (described in the genomic section) by the field ‘id_transcript’ and to the ‘rna_seq_libraries’ table by the field ‘id_library’. In the case of keyword, EUCANEXT will return all genes with the searched annotation and each correspondent in Eucalyptus grandis linked to the transcript interface, described in the section ‘Searching for a transcript’. This search is recommended if the user want to find transcripts related to a specific ontology term or related to one ontology function. The exact hypergeometric distributions were implemented using gamma function [45]

Discussion
Findings
Conclusions
Availability of data and materials
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call