Abstract

BackgroundReproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms for gene expression measurements. This study introduces a novel methodology for gene-ranking stability analysis that is applied to the evaluation of gene-ranking reproducibility on NGS and microarray data.ResultsThe same data used in a well-known MicroArray Quality Control (MAQC) study was also used in this study to compare ranked lists of genes from MAQC samples A and B, obtained from Affymetrix HG-U133 Plus 2.0 and Roche 454 Genome Sequencer FLX platforms. An initial evaluation, where the percentage of overlapping genes was observed, demonstrates higher reproducibility on microarray data in 10 out of 11 gene-ranking methods. A gene set enrichment analysis shows similar enrichment of top gene sets when NGS is compared with microarrays on a pathway level. Our novel approach demonstrates high accuracy of decision trees when used for knowledge extraction from multiple bootstrapped gene set enrichment analysis runs. A comparison of the two approaches in sample preparation for high-throughput sequencing shows that alternating decision trees represent the optimal knowledge representation method in comparison with classical decision trees.ConclusionsUsual reproducibility measurements are mostly based on statistical techniques that offer very limited biological insights into the studied gene expression data sets. This paper introduces the meta-learning-based gene set enrichment analysis that can be used to complement the analysis of gene-ranking stability estimation techniques such as percentage of overlapping genes or classic gene set enrichment analysis. It is useful and practical when reproducibility of gene ranking results or different gene selection techniques is observed. The proposed method reveals very accurate descriptive models that capture the co-enrichment of gene sets which are differently enriched in the compared data sets.

Highlights

  • Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis

  • It is important to check whether next-generation sequencing (NGS) data follow similar characteristics to microarray data sets

  • There is no similar study that would evaluate the stability of NGS of MicroArray Quality Control (MAQC) data using the percentage of overlapping genes (POG) metric

Read more

Summary

Introduction

Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. The measurements of concordance were done by overlapping the selected genes with different settings for n top genes Among other conclusions, this empirical study once again explained that rankings of genes that pass through different gene selection methods may be considerably different. This empirical study once again explained that rankings of genes that pass through different gene selection methods may be considerably different Another similar study, conducted by Qiu et al [5], evaluated the stability of differentially expressed genes using the measurement of frequency, by which a given gene is selected across subsamples. They showed that re-sampling can be an appropriate technique to determine a set of genes with sufficiently high frequency. They recommended using re-sampling techniques to assess the variability of different performance indicators

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.