Ameliorative missing value imputation for robust biological knowledge inference

Muhammad Shoaib B Sehgal,Iqbal Gondal,Laurence S Dooley,Ross Coppel

doi:10.1016/j.jbi.2007.10.005

Muhammad Shoaib B Sehgal, Iqbal Gondal + Show 2 more

https://doi.org/10.1016/j.jbi.2007.10.005

Copy DOI

Journal: Journal of Biomedical Informatics	Publication Date: Dec 31, 2007
Citations: 51	License type: elsevier-specific: oa user license

Affiliation: Melbourne Bioinformatics, Monash University

Abstract

Gene expression data is widely used in various post genomic analyses. The data is often probed using microarrays due to their ability to simultaneously measure the expressions of thousands of genes. The expression data, however, contains significant numbers of missing values, which can impact on subsequent biological analysis. To minimize the impact of these missing values, several imputation algorithms including Collateral Missing Value Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), Local Least Square Impute (LLSImpute), and K-Nearest Neighbour (KNN) have been proposed. These algorithms, however, exploit either only the global or local correlation structure of the data, which normally can lead to higher estimation errors. This paper presents an Ameliorative Missing Value Imputation (AMVI) technique which has ability to exploit global/local and positive/negative correlations in a given dataset by automatic selection of the optimal number of predictor genes k using a wrapper non-parametric method based on Monte Carlo simulations. The AMVI technique has CMVE strategy at its core because CMVE has demonstrated improved performance compared to both low variance methods like BPCA, LLSImpute, and high variance methods such as KNN and ZeroImpute, as CMVE exploits positive/negative correlations. The performance of AMVI is compared with CMVE, BPCA, LLSImpute, and KNN by randomly removing between 1% and 15% missing values in eight different ovarian, breast cancer and yeast datasets. Together with the standard NRMS error metric, the True Positive (TP) rate of the significant genes selection, biological significance of the selected genes and the statistical significance test results are presented to investigate the impact of missing values on subsequent biological analysis. The enhanced performance of AMVI was demonstrated by its lower NRMS error, improved TP rate, bio significance of the selected genes and statistical significance test results, when compared with the aforementioned imputation methods across all the datasets. The results show that AMVI adapted to the latent correlation structure of the data and proved to be an effective and robust approach compared with the trial and error methodology for selecting k. The results confirmed that AMVI can be successfully applied to accurately impute missing values prior to any microarray data analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ameliorative missing value imputation for robust biological knowledge inference

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

Gene Expression Imputation Techniques for Robust Post Genomic Knowledge Discovery
Muhammad Shoaib Sehgal ... Laurence Dooley
-
Muhammad Shoaib Sehgal, et. al.Muhammad Shoaib Sehgal ... Laurence Dooley
01 Jan 2008
01 Jan 2008

Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing
Muhammad Shoaib B Sehgal ... Laurence Dooley
-
Muhammad Shoaib B Sehgal, et. al.Muhammad Shoaib B Sehgal ... Laurence Dooley
01 Jan 2004
01 Jan 2004

Heuristic Non Parametric Collateral Missing Value Imputation: A Step Towards Robust Post-genomic Knowledge Discovery
Muhammad Shoaib B Sehgal ... Iqbal Gondal
-
Muhammad Shoaib B Sehgal, et. al.Muhammad Shoaib B Sehgal ... Iqbal Gondal
01 Jan 2008
01 Jan 2008

An ensemble based missing value estimation in DNA microarray using artificial neural network
Sujay Saha ... Anupam Ghosh
-
Sujay Saha, et. al.Sujay Saha ... Anupam Ghosh
01 Sep 2016
01 Sep 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ameliorative missing value imputation for robust biological knowledge inference

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics