High-dimensional Datasets Research Articles

BackgroundMediation analysis is a powerful tool to identify factors mediating the causal pathway of exposure to health outcomes. Mediation analysis has been extended to study a large number of potential mediators in high-dimensional data settings. The presence of confounding in observational studies is inevitable. Hence, it’s an essential part of high-dimensional mediation analysis (HDMA) to adjust for the potential confounders. Although the propensity score (PS) related method such as propensity score regression adjustment (PSR) and inverse probability weighting (IPW) has been proposed to tackle this problem, the characteristics with extreme propensity score distribution of the PS-based method would result in the biased estimation.MethodsIn this article, we integrated the overlapping weighting (OW) technique into HDMA workflow and proposed a concise and powerful high-dimensional mediation analysis procedure consisting of OW confounding adjustment, sure independence screening (SIS), de-biased Lasso penalization, and joint-significance testing underlying the mixture null distribution. We compared the proposed method with the existing method consisting of PS-based confounding adjustment, SIS, minimax concave penalty (MCP) variable selection, and classical joint-significance testing.ResultsSimulation studies demonstrate the proposed procedure has the best performance in mediator selection and estimation. The proposed procedure yielded the highest true positive rate, acceptable false discovery proportion level, and lower mean square error. In the empirical study based on the GSE117859 dataset in the Gene Expression Omnibus database using the proposed method, we found that smoking history may lead to the estimated natural killer (NK) cell level reduction through the mediation effect of some methylation markers, mainly including methylation sites cg13917614 in CNP gene and cg16893868 in LILRA2 gene.ConclusionsThe proposed method has higher power, sufficient false discovery rate control, and precise mediation effect estimation. Meanwhile, it is feasible to be implemented with the presence of confounders. Hence, our method is worth considering in HDMA studies.

Read full abstract

Modeling ecological patterns and processes often involve large-scale and complex high-dimensional spatial data. Due to the nonlinearity and multicollinearity of ecological data, traditional geostatistical methods have faced great challenges in model accuracy. As machine learning has increased our ability to construct models on big data, the main focus of the study is to propose the use of statistical models that hybridize machine learning and spatial interpolation methods to cope with increasingly large-scale and complex ecological data. Here, two machine learning algorithms, boosted regression tree (BRT) and least absolute shrinkage and selection operator (LASSO), were combined with ordinary kriging (OK) to model plant invasions across the eastern United States. The accuracies of the hybrid models and conventional models were evaluated by 10-fold cross-validation. Based on an invasive plants dataset of 15 ecoregions across the eastern United States, the results showed that the hybrid algorithms were significantly better at predicting plant invasion when compared to commonly used algorithms in terms of RMSE and paired-samples t-test (with the p-value < .0001). Besides, the additional aspect of the combined algorithms is to have the ability to select influential variables associated with the establishment of invasive cover, which cannot be achieved by conventional geostatistics. Higher accuracy in the prediction of large-scale biological invasions improves our understanding of the ecological conditions that lead to the establishment and spread of plants into novel habitats across spatial scales. The results demonstrate the effectiveness and robustness of the hybrid BRTOK and LASOK that can be used to analyze large-scale and high-dimensional spatial datasets, and it has offered an optional source of models for spatial interpolation of ecology properties. It will also provide a better basis for management decisions in early-detection modeling of invasive species.

Read full abstract

High-dimensional Datasets Research Articles

Related Topics

Articles published on High-dimensional Datasets

Consistency approximation: Incremental feature selection based on fuzzy rough set theory

Cross-spectrum method for acoustic source identification and visualization of airfoil noise

Visually exploring canonical correlation patterns of high-dimensional industrial control datasets based on multi-sensor fusion

A two-stage clonal selection algorithm for local feature selection on high-dimensional data

Robust quantum federated learning with noise

High-dimensional mediation analysis for continuous outcome with confounders using overlap weighting method in observational epigenetic study

Spatial prediction of plant invasion using a hybrid of machine learning and geostatistical method.

The Effects of Shocks on the Real Economy in Romania. A Bayesian FAVAR Approach

Evaluation of sequential feature selection in improving the K-nearest neighbor classifier for diabetes prediction

Efficient Feature Clustering for High-Dimensional Datasets: A Non-Parametric Approach

Exploring Dimensionality Reduction Techniques for Improved Breast Cancer Diagnosis

Advancing Nanomaterial Toxicology Screening Through Efficient and Cost-Effective Quantitative Proteomics.

GHOST: Graph-based higher-order similarity transformation for classification

A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection

Evaluation of High-Dimensional Data Classification for Skin Malignancy Detection Using DL-Based Techniques

Improving golden jackel optimization algorithm: An application of chemical data classification

Sydney’s residential relocation landscape: Machine learning and feature selection methods unpack the whys and whens

Improved Kepler Optimization Algorithm for enhanced feature selection in liver disease classification

An Improved Feature Removal Approach for Classification of High Dimensional Feature Dataset

A Hybrid Rider Optimization with Deep Learning Driven Intrusion Detection Farmwork in Wireless Sensor Network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-dimensional Datasets Research Articles

Related Topics

Articles published on High-dimensional Datasets

Consistency approximation: Incremental feature selection based on fuzzy rough set theory

Cross-spectrum method for acoustic source identification and visualization of airfoil noise

Visually exploring canonical correlation patterns of high-dimensional industrial control datasets based on multi-sensor fusion

A two-stage clonal selection algorithm for local feature selection on high-dimensional data

Robust quantum federated learning with noise

High-dimensional mediation analysis for continuous outcome with confounders using overlap weighting method in observational epigenetic study

Spatial prediction of plant invasion using a hybrid of machine learning and geostatistical method.

The Effects of Shocks on the Real Economy in Romania. A Bayesian FAVAR Approach

Evaluation of sequential feature selection in improving the K-nearest neighbor classifier for diabetes prediction

Efficient Feature Clustering for High-Dimensional Datasets: A Non-Parametric Approach

Exploring Dimensionality Reduction Techniques for Improved Breast Cancer Diagnosis

Advancing Nanomaterial Toxicology Screening Through Efficient and Cost-Effective Quantitative Proteomics.

GHOST: Graph-based higher-order similarity transformation for classification

A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection

Evaluation of High-Dimensional Data Classification for Skin Malignancy Detection Using DL-Based Techniques

Improving golden jackel optimization algorithm: An application of chemical data classification

Sydney’s residential relocation landscape: Machine learning and feature selection methods unpack the whys and whens

Improved Kepler Optimization Algorithm for enhanced feature selection in liver disease classification

An Improved Feature Removal Approach for Classification of High Dimensional Feature Dataset

A Hybrid Rider Optimization with Deep Learning Driven Intrusion Detection Farmwork in Wireless Sensor Network