Imputation Approach Research Articles

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the data matrix, poses a significant challenge by reducing the accuracy, reproducibility, and interpretability of the results. Consequently, effective handling of missing values is crucial for reliable quantitative analysis in proteomic studies. This study systematically evaluated the performance of selected imputation methods for addressing missing values in proteomic dataset. Two protein identification algorithms, FragPipe and MaxQuant, were employed to generate datasets, enabling an assessment of their influence on im-putation efficacy. Ten imputation methods, representing three methodological categories—single-value (LOD, ND, SampMin), local-similarity (kNN, LLS, RF), and global-similarity approaches (LSA, BPCA, PPCA, SVD)—were analyzed. The study also investigated the impact of data logarithmization on imputation performance. The evaluation process was conducted in two stages. First, performance metrics including normalized root mean square error (NRMSE) and the area under the receiver operating characteristic (ROC) curve (AUC) were applied to datasets with artificially introduced missing values. The datasets were designed to mimic varying MV rates (10%, 25%, 50%) and proportions of values missing not at random (MNAR) (0%, 20%, 40%, 80%, 100%). This step enabled the assessment of data characteristics on the relative effectiveness of the imputation methods. Second, the imputation strategies were applied to real proteomic datasets containing natural missing values, focusing on the true-positive (TP) classification of proteins to evaluate their practical utility. The findings highlight that local-similarity-based methods, particularly random forest (RF) and local least-squares (LLS), consistently exhibit robust performance across varying MV scenarios. Furthermore, data logarithmization significantly enhances the effectiveness of global-similarity methods, suggesting it as a beneficial preprocessing step prior to imputation. The study underscores the importance of tailoring imputation strategies to the specific characteristics of the data to maximize the reliability of label-free quantitative proteomics. Interestingly, while the choice of protein identification algorithm (FragPipe vs. MaxQuant) had minimal influence on the overall imputation error, differences in the number of proteins classified as true positives revealed more nuanced effects, emphasizing the interplay between imputation strategies and downstream analysis outcomes. These findings provide a comprehensive framework for improving the accuracy and reproducibility of proteomic analyses through an informed selection of imputation approaches.

Read full abstract

BackgroundEarly detection of Alzheimer’s disease (AD) is essential for timely management and consideration of therapeutic options; therefore, detecting the risk of conversion from mild cognitive impairment (MCI) to AD is crucial during neurodegenerative progression. Existing neuroimaging studies have mostly focused on group differences between individuals with MCI (or AD) and cognitively normal (CN), discarding the temporal information of conversion time. Here, we aimed to develop a prognostic model for AD conversion using functional connectivity (FC) and Cox regression suitable for conversion event modeling.MethodsWe developed a prognostic model using a large-scale Alzheimer’s Disease Neuroimaging Initiative dataset, and it was validated using external data obtained from the Open Access Series of Imaging Studies. We considered individuals who were initially CN or had MCI but progressed to AD and those with MCI with no progression to AD during the five-year follow-up period. As the exact conversion time to AD is unknown, we inferred this information using imputation approaches. We generated cortex-wide principal FC gradients using manifold learning techniques and computed subcortical-weighted manifold degrees from baseline functional magnetic resonance imaging data. A penalized Cox regression model with an elastic net penalty was adopted to define a risk score predicting the risk of conversion to AD, using FC gradients and clinical factors as regressors.ResultsOur prognostic model predicted the conversion risk and confirmed the role of imaging-derived manifolds in the conversion risk. The brain regions that largely contributed to predicting AD conversion were the heteromodal association and visual cortices, as well as the caudate and hippocampus. Our risk score based on Cox regression was consistent with the expected disease trajectories and correlated with positron emission tomography tracer uptake and symptom severity, reinforcing its clinical usefulness. Our findings were validated using an independent dataset. The cross-sectional application of our model showed a higher risk for AD than that for MCI, which correlated with symptom severity scores in the validation dataset.ConclusionWe proposed a prognostic model predicting the risk of conversion to AD. The associated risk score may provide insights for early intervention in individuals at risk of AD conversion.

Read full abstract

Imputation Approach Research Articles

Related Topics

Articles published on Imputation Approach

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach

Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation.

WBDI Approach for Univariate Time Series Imputation

Imputation Methods for Missing Values in Estimation of Population Mean under Diagonal Systematic Sampling Scheme

Data discretization impact on deep learning for missing value imputation of continuous data

A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications

Implementation of PPCA Imputation, SMOTE-N Class Balancing in Hepatitis Classification Using Naïve Bayes

Assessing the impact of missing data in youth overweight and obesity research: complete case analysis versus multiple imputation

Comparing imputation approaches for immigration status in ED visits: Implications for using electronic medical records.

Estimating the impact of social distance policy in mitigating COVID-19 spread with factor-based imputation approach

ReMiND: Recovery of missing neuroimaging using diffusion models with application to Alzheimer’s disease

The association between tobacco use and COVID-19 diagnoses in three Nordic countries: a pooled analysis.

Prognostic model for predicting Alzheimer’s disease conversion using functional connectome manifolds

Low-Rank Tensor and Hybrid Smoothness Regularization-Based Approach for Traffic Data Imputation With Multimodal Missing

Multiple imputation of missing data in large studies with many variables: A fully conditional specification approach using partial least squares.

Enhanced prediction of agricultural CO2 emission using ensemble machine learning-based imputation approach

ScDTL: enhancing single-cell RNA-seq imputation through deep transfer learning with bulk cell information.

Systematically missing data in distributed data networks: multiple imputation when data cannot be pooled

Autoencoder imputation of missing heterogeneous data for Alzheimer's disease classification

Addressing the implementation challenge of risk prediction model due to missing risk factors: The submodel approximation approach.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Imputation Approach Research Articles

Related Topics

Articles published on Imputation Approach

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach

Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation.

WBDI Approach for Univariate Time Series Imputation

Imputation Methods for Missing Values in Estimation of Population Mean under Diagonal Systematic Sampling Scheme

Data discretization impact on deep learning for missing value imputation of continuous data

A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications

Implementation of PPCA Imputation, SMOTE-N Class Balancing in Hepatitis Classification Using Naïve Bayes

Assessing the impact of missing data in youth overweight and obesity research: complete case analysis versus multiple imputation

Comparing imputation approaches for immigration status in ED visits: Implications for using electronic medical records.

Estimating the impact of social distance policy in mitigating COVID-19 spread with factor-based imputation approach

ReMiND: Recovery of missing neuroimaging using diffusion models with application to Alzheimer’s disease

The association between tobacco use and COVID-19 diagnoses in three Nordic countries: a pooled analysis.

Prognostic model for predicting Alzheimer’s disease conversion using functional connectome manifolds

Low-Rank Tensor and Hybrid Smoothness Regularization-Based Approach for Traffic Data Imputation With Multimodal Missing

Multiple imputation of missing data in large studies with many variables: A fully conditional specification approach using partial least squares.

Enhanced prediction of agricultural CO2 emission using ensemble machine learning-based imputation approach

ScDTL: enhancing single-cell RNA-seq imputation through deep transfer learning with bulk cell information.

Systematically missing data in distributed data networks: multiple imputation when data cannot be pooled

Autoencoder imputation of missing heterogeneous data for Alzheimer's disease classification

Addressing the implementation challenge of risk prediction model due to missing risk factors: The submodel approximation approach.