False-positive Threshold Research Articles

The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in large-scale exploratory data analysis can result in a large number of false positive results. The permutation testing-based FWER control method (PFWER) is theoretically effective in dealing with multiple hypothesis testing issues. In reality, however, this theoretical approach confronts a serious computational efficiency problem. It takes an extremely long time to compute an appropriate FWER false positive control threshold using PFWER, which is almost impossible to achieve in a reasonable amount of time using human effort on medium- or large-scale data. Although some methods for improving the efficiency of the FWER false positive control threshold calculation have been proposed, most of them are stand-alone, and there is still a lot of space for efficiency improvement. To address this problem, this paper proposes a distributed PFWER false-positive threshold calculation method for large-scale data. The computational effectiveness increases significantly when compared to the current approaches. The FP-growth algorithm is used first for pattern mining, and the mining process reduces the computation of invalid patterns by using pruning operations and index optimization for merging patterns with index transactions. The distributed computing technique is introduced on this basis, and the constructed FP tree is decomposed into a set of subtrees, each corresponding to a subtask. All subtrees (subtasks) are distributed to different computing nodes. Each node independently calculates the local significance threshold according to the designated subtasks. Finally, all local results are aggregated to compute the FWER false positive control threshold, which is completely consistent with the theoretical result. A series of experimental findings on 11 real-world datasets demonstrate that the distributed algorithm proposed in this paper can significantly improve the computation efficiency of PFWER while ensuring its theoretical accuracy.

Read full abstract

Current studies indicate that long-term exposure to ambient fine particulate matter (PM2.5) is related with global mortality, yet no studies have explored relationships of PM2.5 and its species with DNAm PhenoAge acceleration (DNAmPhenoAccel), a new epigenetic biomarker of phenotypic age. We identified which PM2.5 species had association with DNAmPhenoAccel in a one-year exposure window in a longitudinal cohort. We collected whole blood samples from 683 elderly men in the Normative Aging Study between 1999 and 2013 (n = 1254 visits). DNAm PhenoAge was calculated using 513 CpGs retrieved from the Illumina Infinium HumanMethylation450 BeadChip. Daily concentrations of PM2.5 species were measured at a fixed air-quality monitoring site and one-year moving averages were computed. Linear mixed-effect (LME) regression and Bayesian kernel machine (BKM) regression were used to estimate the associations. The covariates included chronological age, body mass index (BMI), cigarette pack years, smoking status, estimated cell types, batch effects etc. Benjamini-Hochberg false discovery rate at a 5% false positive threshold was used to adjust for multiple comparison. During the study period, the mean DNAm PhenoAge and chronological age in our subjects were 68 and 73 years old, respectively. Using LME model, only lead and calcium were significantly associated with DNAmPhenoAccel. For example, an interquartile range (IQR, 0.0011 μg/m3) increase in lead was associated with a 1.29-year [95% confidence interval (CI): 0.47, 2.11] increase in DNAmPhenoAccel. Using BKM model, we selected PM2.5, lead, and silicon to be predictors for DNAmPhenoAccel. A subsequent LME model showed that only lead had significant effect on DNAmPhenoAccel: 1.45-year (95% CI: 0.46, 2.46) increase in DNAmPhenoAccel following an IQR increase in one-year lead. This is the first study that investigates long-term effects of PM2.5 components on DNAmPhenoAccel. The results demonstrate that lead and calcium contained in PM2.5 was robustly associated with DNAmPhenoAccel.

Read full abstract

False-positive Threshold Research Articles

Articles published on False-positive Threshold

Efficient False Positive Control Algorithms in Big Data Mining

Three-dimensional reconstruction and virtual reposition of fragments compared to two dimensional measurements of midshaft clavicle fracture shortening

A-12 Racial Differences in Performance Validity Test Failure Rates

A-184 Performance Validity Test Failure Rates in Bilingual Individuals Evaluated in English

Associations of annual ambient PM2.5 components with DNAm PhenoAge acceleration in elderly men: The Normative Aging Study

Assessing the Accuracy of Variant Detection in Cost-Effective Gene Panel Testing by Next-Generation Sequencing

Detection of and Compensation for EMG Disturbances for Powered Lower Limb Prosthesis Control.

Applying label-free quantitation to top down proteomics.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

False-positive Threshold Research Articles

Articles published on False-positive Threshold

Efficient False Positive Control Algorithms in Big Data Mining

Three-dimensional reconstruction and virtual reposition of fragments compared to two dimensional measurements of midshaft clavicle fracture shortening

A-12 Racial Differences in Performance Validity Test Failure Rates

A-184 Performance Validity Test Failure Rates in Bilingual Individuals Evaluated in English

Associations of annual ambient PM2.5 components with DNAm PhenoAge acceleration in elderly men: The Normative Aging Study

Assessing the Accuracy of Variant Detection in Cost-Effective Gene Panel Testing by Next-Generation Sequencing

Detection of and Compensation for EMG Disturbances for Powered Lower Limb Prosthesis Control.

Applying label-free quantitation to top down proteomics.