Abstract

BackgroundWhen the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences.ResultsWe present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence.We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis.ConclusionsThe management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1195-2) contains supplementary material, which is available to authorized users.

Highlights

  • Introduction of biological and technical replicatesTechnical replicates must be merged and considered as a single experiment

  • Two preliminary studies have been performed in order to better understand the nature of multireads: an evaluation of the presence of overlapping portions among the genes and an examination of the variability of fold change results correlated to the magnitude of gene expression

  • In this paper we have presented a method for dealing with the problem of multireads without statistical assumptions and probability estimations

Read more

Summary

Introduction

Introduction of biological and technical replicatesTechnical replicates must be merged and considered as a single experiment. As in classic read counting, the technical replicates can be merged before or after the mapping step, and the counts can be merged by sum This can be done with fuzzy gene counts, by summing the four values obtained for each gene: Tr1⁄2A0 þ þ D 00 Š. Ð3Þ where Tr[A′,B′,C′,D′] and Tr[A′′,B′′,C′′,D′′] are fuzzy read count for the same gene in two technical replicates. Biological replicates are different samples belonging to the same condition This case requires a more reasoned approach, but, as a preliminary method, we propose to merge the fuzzy sets in order to cover all the possible values:. A typical differential expression (DE) analysis workflow is composed by three main steps: (1) read mapping, (2) gene expression computation and (3) identification of noticeable differences between the samples. Several normalization techniques [2,3,4] and DE analysis models [5,6,7] have been designed for sequence count data

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call