Abstract

Myrciaria dubia “camu-camu” is a native shrub of the Amazon that is commonly found in areas that are flooded for three to four months during the annual hydrological cycle. This plant species is exceptional for its capacity to biosynthesize and accumulate important quantities of a variety of health-promoting phytochemicals, especially vitamin C [1], yet few genomic resources are available [2]. Here we provide the dataset of a de novo assembly and functional annotation of the transcriptome from a pool of samples obtained from seeds during the germination process and seedlings during the initial growth (until one month after germination). Total RNA/mRNA was purified from different types of plant materials (i.e., imbibited seeds, germinated seeds, and seedlings of one, two, three, and four weeks old), pooled in equimolar ratio to generate the cDNA library and RNA paired-end sequencing was conducted on an Illumina HiSeq™2500 platform. The transcriptome was de novo assembled using Trinity v2.9.1 and SuperTranscripts v2.9.1. A total of 21,161 transcripts were assembled ranging in size from 500 to 10,001 bp with a N50 value of 1,485 bp. Completeness of the assembly dataset was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) software v2/v3. Finally, the assembled transcripts were functionally annotated using TransDecoder v3.0.1 and the web-based platforms Kyoto Encyclopedia of Genes and Genomes (KEGG) Automatic Annotation Server (KAAS), and FunctionAnnotator. The raw reads were deposited into NCBI and are accessible via BioProject accession number PRJNA615000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA615000) and Sequence Read Archive (SRA) with accession number SRX7990430 (https://www.ncbi.nlm.nih.gov/sra/SRX7990430). Additionally, transcriptome shotgun assembly sequences and functional annotations are available via Discover Mendeley Data (https://data.mendeley.com/datasets/2csj3h29fr/1).

Highlights

  • Myrciaria dubia “camu-camu” is a native shrub of the Amazon that is commonly found in areas that are flooded for three to four months during the annual hydrological cycle

  • We provide the dataset of a de novo assembly and functional annotation of the transcriptome from a pool of samples obtained from seeds during the germination process and seedlings during

  • RNA/mRNA was purified from different types of plant materials, pooled in equimolar ratio to generate the cDNA library and RNA paired-end sequencing was conducted on an Illumina HiSeqTM2500 platform

Read more

Summary

Data description

In this dataset the de novo assembly and functional annotation of the transcriptome during germination and initial growth of seedlings of M. dubia “camu-camu” is reported for the first time. The de novo assembled transcripts were functionally annotated. FunctionAnnotator obtained 20,382 best hits from the NCBI non-redundant protein database with taxonomic distribution of which 18,050 transcripts mapped to Gene Ontology in the tree classes (Fig. 4, Table S2) such as biological process (15,353), cellular component (15,401), and molecular function (14,354), and 2357 transcripts were identified as coding enzymes, totalling 680 different enzymes of the six classes and 16,091 transcripts coding at least one domain region in proteins (4838 different domains were identified). Transcriptome shotgun assembly sequences and functional annotations are available via Discover Mendeley Data (https://data.mendeley.com/datasets/2csj3h29fr/1)

Plant materials
Findings
De novo assembly and functional annotation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call