YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

Sandeep Chakraborty,Jill Wegrzyn,Keith Woeste,Timothy Butterfield,Abhaya M Dandekar,Charles A Leslie,Basuthkar J Rao,David Neale,Monica Britton,Mallikarjuna Aradhaya

doi:10.12688/f1000research.6617.1

Abstract

The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.

Highlights

Analysis of the complete set of RNA molecules in a cell, the transcriptome, is critical to understanding the functional aspects of the genome of an organism
The input dataset to the YeATS tool was a set of transcripts, transcript identifiers and their corresponding raw counts, obtained from the tissue at the heartwood/sapwood transition zone (TZ) in black walnut (Juglans nigra L.) (Figure 2)
README FASTADIR.tgz : 24k transcripts ORFS.tgz : open reading frames from 24k transcripts computed from the ‘getorf’ tool from the Emboss suite. list.merged.txt : transcripts that have been merged based on overlapping ends High.TZ.genome.annotated.csv : transcripts having only one ORF with a high significance match Lower.TZ.genome.annotated.csv : transcripts having only one ORF with a lower significance match TZ.genome.annotated.none.csv : transcripts with no match TZ.genome.errors : transcripts which have two ORFs matching with high significance to the same gene TZ.genome.annotated.morethanone.csv : transcripts having more than one ORFs which match to different genes with high significance rawcounts.TZ: Raw counts rawcounts.normalized.TZ: Normalized counts

Summary

Introduction

Analysis of the complete set of RNA molecules in a cell, the transcriptome, is critical to understanding the functional aspects of the genome of an organism. Non-translated transcripts (noncoding RNAs) may be alternatively spliced and/or broken into smaller RNAs, the importance of which have only recently been recognized[2]. Transcriptional levels vary significantly based on environmental cues[3], and/or disease[4]. Quantifying transcriptional levels constitutes an important methodology in current biological research. Traditional methods like RNA:DNA hybridization[5] and short sequence-based approaches[6] have been supplanted recently by a high-throughput DNA sequencing method - RNA-seq[7,8]. Concomitant with the introduction of RNA-seq has been the development of a diverse set of computational methods for analyzing the resultant data[9,10,11,12,13,14,15,16,17,18,19,20,21]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Jun 17, 2015
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut.
Sandeep Chakraborty ... Abhaya M Dandekar
F1000Research | VOL. 4
Sandeep Chakraborty, et. al.Sandeep Chakraborty ... Abhaya M Dandekar
06 Nov 2015
F1000Research | VOL. 4

YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut
David Neale ... Monica Britton
F1000Research | VOL. 4
David Neale, et. al.David Neale ... Monica Britton
13 Oct 2015
F1000Research | VOL. 4

First Report of the Walnut Witches'-Broom Phytoplasma on Japanese and Black Walnut in Iowa.
H Y Yun ... T C Harrington
Plant disease | VOL. 95
H Y Yun, et. al.H Y Yun ... T C Harrington
01 Nov 2011
Plant disease | VOL. 95

Genome Resource of Colletotrichum spaethianum, the Causal Agent of Leaf Anthracnose in Polygonatum falcatum
Yuniar Devi Utami ... Kei Hiruma
PhytoFrontiers™ | VOL. 2
Yuniar Devi Utami, et. al.Yuniar Devi Utami ... Kei Hiruma
27 Apr 2022
PhytoFrontiers™ | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research