Abstract

To improve the applicability of RNA-seq technology, a large number of RNA-seq data analysis methods and correction algorithms have been developed. Although these new methods and algorithms have steadily improved transcriptome analysis, greater prediction accuracy is needed to better guide experimental designs with computational results. In this study, a new tool for the identification of differentially expressed genes with RNA-seq data, named GExposer, was developed. This tool introduces a local normalization algorithm to reduce the bias of nonrandomly positioned read depth. The naive Bayes classifier is employed to integrate fold change, transcript length, and GC content to identify differentially expressed genes. Results on several independent tests show that GExposer has better performance than other methods. The combination of the local normalization algorithm and naive Bayes classifier with three attributes can achieve better results; both false positive rates and false negative rates are reduced. However, only a small portion of genes is affected by the local normalization and GC content correction.

Highlights

  • RNA-Seq is a technology based on next-generation sequencing to determine transcript abundance, transcriptional structure of genes, and posttranscriptional modifications

  • A receiver operating characteristic (ROC) curve represents a dependency of sensitivity and (1 − specificity), which is plotted with true positives rate versus false positive rate at various threshold settings

  • For no-call genes, the model trained by all differentially expressed (DE) and NDE genes was used to score them with an naive Bayes (NB) classifier

Read more

Summary

Introduction

RNA-Seq is a technology based on next-generation sequencing to determine transcript abundance, transcriptional structure of genes, and posttranscriptional modifications. RNA-seq data are typically generated from a library of cDNA fragments made from a population of mRNAs. cDNAs are sequenced en masse with or without amplification. The obtained short reads are first aligned to a reference genome or transcriptome, and, in the second step, for a given gene, the numbers of reads are compared between two different samples. The number of short reads mapped onto one gene is the count that is taken as a measure of the expression level of the gene. Many different types of analyses can be applied to the results of short-read alignment, including single nucleotide polymorphism discovery, alternative transcript identification, and gene expression profiling

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call