RNA-seq assembler artifacts can bias expression counts and differential expression analysis - application of YeATS on the chickpea transcriptome

Sandeep Chakraborty

doi:10.12688/f1000research.9667.1

Abstract

Background: The unprecedented volume of genomic and transcriptomic data analyzed by software pipelines makes verification of inferences based on such data, albeit theoretically possible, a challenging proposition. The availability of intermediate data can immensely aid re-validation efforts. One such example is the transcriptome, assembled from raw RNA-seq reads, which is frequently used for annotation and quantification of genes transcribed. The quality of the assembled transcripts influences the accuracy of inferences based on them. Method: Here the publicly available transcriptome from Cicer arietinum (ICC4958; Desi chickpea, http://www.nipgr.res.in/ctdb.html) was analyzed using YeATS. Results and Conclusion: The analysis revealed that a majority of the highly expressed transcripts (HET) encoded multiple genes, strongly indicating that the counts may have been biased by the merging of different transcripts. TC00004 is ranked in the top five HET for all five tissues analyzed here, and encodes both a retinoblastoma-binding-like protein (E-value=0) and a senescence-associated protein (E-value= 5e-108). Fragmented transcripts are another source of error. The ribulose bisphosphate carboxylase small chain (RBCSC) protein is split into two transcripts with an overlapping amino acid sequence ”ASNGGRVHC”, TC13991 and TC23009, with length 201 and 332 nucleotides and expression counts 17.90 and 1403.8, respectively. The huge difference in counts indicates an erroneous normalization algorithm in determining counts. It is well known that RBCSC is highly expressed and expectedly TC23009 ranks fifth among HETs in the shoot. Furthermore, some transcripts are split into open reading frames that map to the same protein, although this should not have any significant bearing on the counts. It is proposed that studies analyzing differential expression based on the transcriptome should consider these artifacts, and providing intermediate assembled transcriptomes should be mandatory, possibly with a link to the raw sequence data (Bioproject).

Highlights

The lack of reproducibility of results in biology is a contentious subject[1,2]
Several online resources exist for chickpea genomes and transcriptomes
The top five highly expressed transcripts (HET) from five tissues - flower bud (FB), mature leaf (ML), root (RT), shoot (SH), young plant (YP) - were obtained from http://www.nipgr.res.in/ctdb.html

Summary

Introduction

The lack of reproducibility of results in biology is a contentious subject[1,2]. In computational studies, the exact replication of the output of most computer programs is difficult as most non-trivial algorithms use heuristics. The availability of intermediate data can immensely aid re-validation efforts One such example is the transcriptome, assembled from raw RNA-seq reads, which is frequently used for annotation and quantification of genes transcribed. Results and Conclusion: The analysis revealed that a majority of the highly expressed transcripts (HET) encoded multiple genes, strongly indicating that the counts may have been biased by the merging of different transcripts. TC00004 is ranked in the top five HET for all five tissues analyzed here, and encodes both a retinoblastoma-binding-like protein (E-value=0) and a senescence-associated protein (E-value= 5e-108) Fragmented transcripts are another source of error. It is proposed that studies analyzing differential expression based on the transcriptome should consider these artifacts, and providing intermediate assembled transcriptomes should be mandatory, possibly with a link to the raw sequence data (Bioproject)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RNA-seq assembler artifacts can bias expression counts and differential expression analysis - application of YeATS on the chickpea transcriptome

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Journal: F1000Research	Publication Date: Sep 27, 2016
License type: CC BY 4.0

Similar Papers

RNA-seq assembler artifacts can bias expression counts and differential expression analysis - case study on the chickpea transcriptome emphasizes importance of freely accessible data for reproducibility
Sandeep Chakraborty
F1000Research | VOL. 5
Sandeep ChakrabortySandeep Chakraborty
06 Dec 2016
F1000Research | VOL. 5

Nutritional value of raw and autoclaved kabuli and desi chickpeas ( Cicer arietinum L.) for growing chickens
A Viveros ... R Canales
British Poultry Science | VOL. 42
A Viveros, et. al.A Viveros ... R Canales
01 May 2001
British Poultry Science | VOL. 42

Editor's evaluation: Comparative transcriptomic analysis reveals translationally relevant processes in mouse models of malaria
Urszula Krzych
-
Urszula KrzychUrszula Krzych
11 Aug 2021
11 Aug 2021

Endometrial receptivity revisited: endometrial transcriptome adjusted for tissue cellular heterogeneity.
Marina Suhorutshenko ... Triin Laisk
Human Reproduction | VOL. 33
Marina Suhorutshenko, et. al.Marina Suhorutshenko ... Triin Laisk
08 Oct 2018
Human Reproduction | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RNA-seq assembler artifacts can bias expression counts and differential expression analysis - application of YeATS on the chickpea transcriptome

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research