Abstract

Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples, or novel expression assays are used. It is therefore important to assess the validity and impact of the assumptions built in to preprocessing schemes for a dataset. We developed and assessed a data preprocessing strategy for use with the Illumina DASL-based gene expression assay with partially degraded postmortem prefrontal cortex samples. The samples were obtained from individuals with autism as part of an investigation of the pathogenic factors contributing to autism. Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarray-based gene expression values obtained with degraded brain samples. Key steps in the pipeline included outlier exclusion, data transformation and normalization, and batch effect and covariate corrections. Our goal was to produce a clean dataset for subsequent downstream differential expression analysis. We ultimately settled on available transformation and normalization algorithms in the R/Bioconductor package lumi based on an assessment of their use in various combinations. A log2-transformed, quantile-normalized, and batch and seizure-corrected procedure was likely the most appropriate for our data. We empirically tested different components of our proposed preprocessing strategy and believe that our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects can be applied to all studies, even those pursued with degraded samples.

Highlights

  • Great strides have been made in the development of gene expression profiling technologies that can accommodate partially degraded mRNA samples (Fan et al, 2004; April et al, 2009)

  • Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarraybased gene expression values obtained with degraded brain samples

  • Our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects can be fashioned for gene expression assays designed to accommodate degraded samples

Read more

Summary

Introduction

Great strides have been made in the development of gene expression profiling technologies that can accommodate partially degraded mRNA samples (Fan et al, 2004; April et al, 2009). These technologies are especially useful in assaying gene expression levels from unique tissue sources, such as the brain, where the conditions for the preservation of mRNA quality are not typically ideal (Mirnics and Pevsner, 2004). As relevant and sophisticated as gene expression assays that can accommodate partially degraded mRNA may, the application of these assays requires appropriate methods for handling and preprocessing the information resulting from the assay in order to make sure the samples have been assayed properly with minimal residual effects of the degraded RNA. Batch effects and issues of antemortem conditions documented by medical records that are often associated with www.frontiersin.org

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call