Abstract

Detection of differentially expressed (DE) genes across conditions within RNA-Seq datasets yields insight into the differences in biological processes at work in these conditions. Most methods available for discovering DE genes use statistical methods that model the data based on counting reads that map to individual genes. However, the distribution of reads across different regions of a gene can be heterogeneous. Summarizing reads at the gene level may provide inaccurate results. If genes are broken down into smaller regions, such as exons or even smaller fragments, and DE analysis is performed on those regions, the significance of the overall region can be determined using combined p-values which may improve the accuracy of detecting DE genes. We therefore conducted analysis to consider the performance of widely-used methods for combining p-values using publicly available RNA-Seq data. The combined p-value methods include: Fisher's, Z-transform, Weighted Z-test, Minimum P-value, Logit, and Weighted-sum methods. On liver and kidney data, the Weighted Z-test performs the best, detecting the highest number of truly DE genes. The effect of weights assigned in the Weighted Z-test enables this approach to outperform Fisher's method. On the MAQC datasets, our analysis indicates these methods perform similarly with a slight edge to the Weighted Z-test and Fisher's method in detecting true DE genes. However, the Weighted-sum clearly performs best in detecting true non-DE genes. Furthermore, these methods appear to have an inverse relationship in their performance in detecting DE genes versus non-DE genes in the MAQC datasets. These results indicate issues in properly combining high and low p-values, which may be due to a lack of independence between tests. Thus, a modified Fisher's method may provide more accurate results in these circumstances.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call