Abstract

Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

Highlights

  • Over the past 25 years, interrogation of genome-wide gene expression has taken many forms. cDNA and oligonucleotide microarrays (Millen and Glauser, 1978; Lockhart et al, 1996) analysis methods matured over time whereby preprocessing of the data for single-channel microarrays defaulted to the de facto Robust Multichip Average (RMA) normalization (Irizarry et al, 2003a,b)

  • Crowdsourcing bioinformatics analysis of RNA sequencing (RNA-Seq) data through the US Food and Drug Administration MicroArray Quality Control (MAQC), SEquence Quality Control (SEQC) phase effort led to a comprehensive assessment of RNA-Seq analysis including comparison to microarray and normalization using External RNA Control Consortium (ERCC) spike-in controls (Consortium, 2014; Risso et al, 2014; Wang et al, 2014; Xu et al, 2014)

  • We show that based on sensitivity and specificity performance measures as well as the adjusted Rand index (ARI) as a measure of agreement, Upper Quartile (UQ) performed the best with respect to maintaining absolute fold change (FC) levels ≥2.0 as detected in a two-group comparison

Read more

Summary

Introduction

Over the past 25 years, interrogation of genome-wide gene expression has taken many forms. cDNA and oligonucleotide microarrays (Millen and Glauser, 1978; Lockhart et al, 1996) analysis methods matured over time whereby preprocessing of the data for single-channel microarrays defaulted to the de facto Robust Multichip Average (RMA) normalization (Irizarry et al, 2003a,b). Over the past 25 years, interrogation of genome-wide gene expression has taken many forms. In the last few years, targeted sequencing of RNA has emerged as a practical means of capturing the totality of the transcriptomic space with less reliance on large resources for consumables and bioinformatics (Li et al, 2012). The TempO-SeqTM technology from BioSpyderTM is a templated, multiplexed RNA-Seq platform that measures the expression of sentinel genes representative of genome-wide transcription (Yeakley et al, 2017; Mav et al, 2018). A few advantages of TempO-Seq over RNA-Seq is that it does not require RNA purification, cDNA synthesis, nor capture of targeted RNA. There has not been a comprehensive comparison of normalization methods applied to TempO-Seq data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.