Abstract

Transcriptome databases are an important source of structural and functional information about an organism, for example, plants without a sequenced genome. This is the case of the olive tree (Olea europaea L.), one of the most important oil-producing plant species all over the world. In addition, reproductive tissues and seeds are the less studied part of these plant species in spite of their importance in allergies, germination success, plant sterility, as well as being an important source of valuable components for agro-food industries, including seed storage proteins and trialcylglycerides. Therefore, an automated workflow has been developed using our tool AutoFlow to construct an annotated transcriptome from raw reads (Sanger, Illumina or Roche/454 or a combination of them) combining open source software (Bowtie2, CAP2, Euler-SR, MIRA3, Velvet/Oases, AutoFact, MREPS, GigaBayes) with software developed by our group (SeqTrimNext, Full-LengtherNext, Sma3). The resulting transcriptomes were used to build a database ReprOlive (http://reprolive.eez.csic.es) where descriptions, GO terms, InterPro signatures, EC numbers, graphical localization of enzymes in KEGG pathways, ORFs, SSRs, and the corresponding orthologues in Arabidopsis thaliana from TAIR and RefSeq can be browsed. Finally, expression data can be accessed and, in addition to a BLAST search, a the semantic conceptualization using RDF allowing for Linked Data search was implemented to extract the most updated information related to enzymes, interactions, allergens, and structures. The olive tree reproductive transcriptome was constructed from 2,077,309 raw reads (454/Roche Titanium+) and 1,549 Sanger sequences from different stages of pollen and stigma development, resulting in 72,846 contigs, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an orthologue. Using different seed stages, 1,425,911 raw reads (454/Roche Titanium+) are in use for obtaining the seed transcriptome. Uses of these transcriptomes can be found in communications by Carmona et al. and JImnez-Quesada et al. in this congress. This work was supported by co-funding from the ERDF and Spanish MINECO and Andalusian PAIDI to the grants BFU2011-22779, TIN2011-25840, TIN2014-58304-R, P10-CVI-6075, P10-AGR-6274, P11-CVI-7487, P11-TIC-7529 and P12-TIC-1519. Authors also acknowledge the use of the SCBI facilities of UMA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call