Abstract
Transcriptome data is beneficial to explore molecular mechanisms of extreme adaptations in non- model organisms like mangroves. In this data article, five major datasets and two data sub sets of a salt secreting mangrove, Rhizophora mucronata Lam. were described. A combination of Illumina HiSeq 2500, Trinity, BLAST X, Bowtie 2 and BLAST 2GO was used for RNA Seq, de novo assembly, transcript annotation, gene expression estimation and gene ontology annotation respectively. The RNA Sequence (Read 1 and Read 2) in Sequence Read Archive amounting to 46,366,348 paired end raw reads is the first data set made open for de novo or comparative transcript assembly. Assembled sequences of 93960 gene transcripts constitute the second data set in Transcriptome Shotgun Assembly. The gene/protein annotations to the assembled transcripts give two sub data sets containing 93960 each of GenBank and GenPept entries with comprehensive cDNA and translated protein sequences of genes. Of these, predicted proteins for 87768 coding sequences, mapped to UniProtKB serve as the third data set. The gene expression levels of the annotated transcripts comprise the fourth data set in Gene Expression Omnibus. The fifth data set in Figshare includes 44,028 gene ontology terms extracted for 21,073 confident transcripts. The data sets provide a valuable resource for further analyses including transcriptomic changes in response to environmental stresses.
Highlights
Leaf tissue specific transcriptome sequence and de novo assembly datasets of Asiatic mangrove Rhizophora mucronata Lam
A combination of Illumina HiSeq 2500, Trinity, BLAST X, Bowtie 2 and BLAST 2GO was used for RNA Seq, de novo assembly, transcript annotation, gene expression estimation and gene ontology annotation respectively
The RNA Sequence (Read 1 and Read 2) in Sequence Read Archive amounting to 46,366,348 paired end raw reads is the first data set made open for de novo or comparative transcript assembly
Summary
The data reported here is a compilation of five major datasets and two data sub sets. The de novo assembled RNA sequence reads generated by Trinity are contained in the second data set. This Transcriptome Shotgun Assembly project has been deposited under the accession GGEC00000000. The file format used for assembly and annotation data was .sqn. The fourth dataset in Gene Expression Omnibus [1] represents the gene expression levels (FPKM) of the annotated transcripts and can be downloaded in .txt format. The contig ID assigned for contigs by de novo assembly is listed in the first column of the .txt file. GenBank accessions of each contig IDs and the respective FPKM values of gene expression are listed in second and third column respectively (GEO Series accession GSE112162). The file format of GO submission is GAF version 2
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.