Abstract

Reconstruction and annotation of transcripts, particularly for a species without reference genome, plays a critical role in gene discovery, investigation of genomic signatures, and genome annotation in the pre-genomic era. This is the first study to use Single-molecule real-time (SMRT) sequencing for reporting the full-length transcriptome of Portunus pelagicus. Overall, 16.26 Gb of raw reads were obtained, including 7,068,387 subreads, with average length of 2,300 bp and N50 length of 3,594 bp. In total, 351,870 circular consensus sequences (CCS) reads were extracted, including 255,378 full-length non-chimeric (FLNC) reads with mean length of 3,423 bp.70,407 genes were obtained after eliminating redundant sequences, and 56,557 (80.33%) genes were annotated in at least one database, 17,267 (24.52%) genes were annotated in all of the seven databases. Further, 68,797 coding sequences (CDS) were identified, including 36,848 complete CDS. A total of 1,730 unigenes were predicted to be transcription factors (TFs). Finally, 11,894 long noncoding RNA (lncRNA) transcripts were predicted by different computational approaches and 147,262 single sequence repeat (SSR)s were obtained. The transcriptome data reported herein are bound to serve as a basis for future studies on P. pelagicus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call