Abstract

BackgroundHigh-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects.ResultsIn this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors.ConclusionsSimulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named “SCBN”, which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html.

Highlights

  • High-throughput techniques bring novel tools and statistical challenges to genomic research

  • For the setting of different species, we develop a scale based normalization (SCBN) method by utilizing the available knowledge of conserved orthologous genes and the hypothesis testing framework

  • Materials and methods we propose a novel normalization method for RNA-seq data with different species by utilizing the available knowledge of conserved orthologous genes and the hypothesis testing framework

Read more

Summary

Introduction

High-throughput techniques bring novel tools and statistical challenges to genomic research. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. Several studies have emerged in the recent literature to compare the gene expression levels in different organisms using microarrays or RNA-seq data. Liu et al [12] reported a systematic comparison of RNAseq for detecting differential gene expression between closely related species. Kristiansson et al [14] proposed a statistical method for metaanalysis of gene expression profiles from different species with RNA-seq data. To make the expression levels of orthologous genes comparable between different species, Zhou et al BMC Bioinformatics (2019) 20:163 normalization is a crucial step in the data processing procedure

Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.