Abstract
BackgroundQuantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference.ResultsWe propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects.ConclusionsThe statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses.
Highlights
Quantitative real-time Polymerase chain reaction (PCR) is one of the most widely used methods to measure gene expression
These new deoxyribonucleic acid (DNA) copies are added to the pool of DNA templates and the process is repeated multiple times, Sherina et al BMC Bioinformatics (2020) 21:545 so that amplification occurs by chain reaction [2]
By replacing missing values with imputed values, the single imputation (SI) procedure underestimates the residual variance, leading to anti-conservative inference. We address this limitation by developing two new methods to handle Quantitative real-time PCR (qPCR) non-detects: (1) direct estimation of the mean and variance of gene expression using maximum likelihood estimation (DirEst) and (2) a multiple imputation (MI) procedure that models three sources of variability: uncertainty in the missing data mechanism, uncertainty in the parameter estimates, and measurement error
Summary
Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Oligonucleotides complementary to each of the two possible sequences relating to the sense and anti-sense strands of the target DNA are included in the reaction, allowing both strands to be amplified simultaneously. These new DNA copies are added to the pool of DNA templates and the process is repeated multiple times, Sherina et al BMC Bioinformatics (2020) 21:545 so that amplification occurs by chain reaction [2]. Cq values are either related to a known set of copy number standards or a control gene (absolute quantification) [4, 5] or to the Cq value of the same target in another sample (relative quantification) [6]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have